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EXECUTIVE  SUMMARY 


This  research  used  the  second  year  (1979-80)  of  the  Section  15  statistics,  first 
to  test  the  validity  of  a  small  set  of  performance  indicators  for  fixed-route  bus 
operations,  and  second  to  define  relatively  homogeneous  groups  of  operators  (peer 
groups)  that  can  be  compared.  Agencies  operating  304  bus  systems  were  included. 
Rail  operations  were  excluded,  as  were  exclusive,  demand-responsive  operations. 

Data  Preparation 

Chapter  1  reviews  the  data  and  the  methods  used  to  correct  problems  and 
reformat  data  for  statistical  analysis.  The  second  year  data  are  both  more  complete 
and  accurate  than  that  reported  for  the  inaugural  year.  However,  data  from  the 
magnetic  tape  had  to  be  reorganized  and  validated  before  they  could  be  used  with 
any  of  the  major  statistical  software  packages. 

Performance  Indicators 

Chapter  2  analyzes  a  large  set  of  performance  variables  in  conjunction  with 
factor  analysis  to  establish  seven  dimensions  of  transit  performance.  Seven  marker 
indicators  were  chosen  rather  than  the  nine  proposed  in  previous  research.  The 
seven  marker  variables  best  representing  the  performance  concepts  are: 
(RVH/OEXP)  Revenue  Vehicle  Hour  per  Operating  Expense 
(TPAS/RVH)  Unlinked  Passenger  Trips  per  Revenue  Vehicle  Hour 
(OREV/OEXP)  Operating  Revenue  per  Operating  Expense 
(TVH/EMP)  Total  Vehicle  Hours  per  Total  Employees 
(TVM/PVEH)  Total  Vehicle  Miles  per  Peak  Vehicle 
(TVM/MNT)  Total  Vehicle  Miles  per  Maintenance  Employee 
(TVM/ACC)  Total  Vehicle  Miles  per  Accident 

Peer  Group  Typology 

Chapter  3  describes  the  use  of  cluster  analysis  to  create  a  typology  for  transit 
based  upon  characteristics  of  operations  that  are  available  in  the  Section  15 
statistics.  Agency  size  (measured  by  total  vehicle  miles  and  number  of  peak 
vehicles  operated),  peak  to  base  demand  and  average  bus  speed  are  used  to  create 
twelve  peer  groups.  This  new  typology  updates  and  supercedes  the  typology  briefly 
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described  in  1982.  Each  peer  group  was  shown  to  be  distinct  in  its  operating 
characteristics  through  statistical  analyses  and  descriptively  as  follows: 

The  private  bus  companies  in  Peer  Group  1  stand  out  because  of  their  extrennely 
high  average  speed.  They  are  the  smallest  in  size  and  the  lowest  in  peak  to  base 
ratios  relative  to  other  peer  groups. 

Peer  Group  2  consists  of  transit  providers  primarily  located  in  small  urban  areas 
or  suburban  areas  across  the  United  States  with  populations  under  500,000.  They  are 
small  (1  to  46  peak  vehicles),  fast  (17  to  22  miles  per  hour)  and  have  average  peak  to 
base  ratios. 

Although  Peer  Group  3  is  a  cross-national  group,  Southwestern  systems  are 
disproportionately  represented.  While  a  few  systems  are  in  the  suburban  fringes  of 
major  urban  areas,  most  are  in  small  cities  or  towns.  These  systems  are  small  (2  to 
74  peak  vehicles)  with  low  peak  to  base  ratios  (1.0  to  1.15)  and  above  average  speeds. 

Peer  Group  A  draws  from  all  parts  of  the  country  despite  its  small  size.  These 
systems  serve  small  cities  with  suburban  characteristics.  Systems  in  Peer  Group  4 
have  a  high  average  speed  (15.9  to  16.8  miles  per  revenue  vehicle  hour)  and  they 
tend  to  be  small  (fewer  than  50  peak  vehicles)  with  low  peak  to  base  ratios.  Their 
speed  is  consistent  with  their  suburban  locations. 

Peer  Group  5  is  unusual  in  that  nearly  half  of  its  members  are  private  bus 
companies  in  the  urban  New  York  City  area.  Most  of  the  rest  are  small  Midwestern 
city  agencies.  The  systems  in  this  group  are  distinguished  by  their  very  low  speeds. 
They  are  slightly  below  average  in  size,  and  average  in  peak  to  base  ratios. 

Peer  Group  6  draws  systems  from  most  regions  of  the  United  States  but  with  a 
particular  emphasis  on  the  Midwest  and  South  central  regions.  While  a  few  medium 
sized  cities  are  included  in  this  group,  many  of  the  systems  serve  small  towns  or 
somewhat  rural  areas;  three-quarters  of  these  systems  are  in  areas  with  populations 
under  250,000.  Systems  in  this  peer  group  range  in  size,  but  are  generally  below 
average  in  number  of  peak  vehicles.  They  have  low  peak  to  base  ratios. 

Members  of  the  largest  peer  group.  Peer  Group  7,  are  found  in  all  parts  of  the 
United  States.  They  primarily  serve  small  cities  and  large  towns  (77,000  to 
500,000),  although  a  number  are  in  metropolitan  New  York.  Systems  in  this  peer 
group  are  average  in  size  and  speed,  but  above  average  in  peak  to  base  ratios. 

Peer  Group  8  has  primarily  Midwestern  and  Eastern  small  to  medium-sized 
cities,  although  a  few  of  its  members  are  from  the  outer  suburban  sections  of  New 
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York  and  Chicago.  It  differs  from  other  peer  groups  in  its  high  average  peak  to  base 
ratio  (all  above  2.3).  Systenns  in  this  peer  group  range  widely  in  speed  and  size, 
though  there  are  no  systems  over  AOO  peak  vehicles  in  this  group. 

Systems  in  Peer  Group  9  are  all  from  the  Southwestern  areas  of  the  United 
States.  They  predominate  in  suburban,  low  density  areas  with  populations  between 
.5  and  1.5  million.  Systems  in  this  peer  group  are  above  average  in  size  and  speed, 
and  about  average  in  their  peak  to  base  ratios. 

Transit  systems  in  Peer  Group  10  are  all  public  agencies  in  large  urban  areas  (1 
to  3  million),  in  most  regions  of  the  United  States  except  the  Northeast.  These 
systems  have  an  above  average  number  of  peak  vehicles  (260  to  506)  and  usually 
below  average  speeds,  with  a  wide  range  of  peak  to  base  ratios.  Peer  Group  10  is 
similar  to  Peer  Group  1 1 ,  though  the  systems  are  smaller  on  average  and  have 
slightly  lower  peak  to  base  ratios. 

Peer  Group  11  includes  public  transit  agencies  in  major  urban  areas  (1.4  to  16 
million)  in  all  regions  of  the  United  States.  They  have  a  high  number  of  peak 
vehicles  (666  to  1573)  and  are  second  in  size  only  to  Peer  Group  12.  These  systems 
are  above  average  in  peak  to  base  ratio  and  are  average  in  speed. 

The  transit  agencies  in  Peer  Group  12  are  the  major  public  transit  providers  in 
the  three  largest  urban  areas  of  the  United  States.  All  three  have  over  1900  peak 
vehicles.  They  are  one  of  the  two  slowest  groups  of  systems,  and  they  have  slightly 
above  average  peak  to  base  ratios. 

A  peer  group  typology  based  upon  performance  characteristics  was  also 
devised.  However  its  usefulness  is  limited  because  its  structure  is  less  clear  than 
the  previous  typology  and  it  has  limited  applications  in  performance  evaluation. 

Peer  Groups  and  Performance 

Chapter  4  describes  the  performance  of  each  peer  group  and  analyzes  the 
relationships  between  all  12  peer  groups  and  the  seven  performance  indicators. 
Much  more  information  is  presented  on  each  group  so  as  to  clarify  the  distinguishing 
characteristics  in  terms  of  operating  conditions  and  performance. 

Performance  profiles  were  constructed  for  each  peer  group  by  comparing  the 
peer  group's  average  performance  on  each  of  the  seven  performance  indicators  to 
the  national  average  for  each  indicator.  Graphical  representations  of  the  profiles 
revealed  that  each  peer  group  has  a  distinct  pattern  of  performance  across  the 
seven  indicators.  For  instance.  Peer  Groups  one  and  twelve  are  both  very  high  and 


vi 


very  low  on  some  indicators.  But  their  relative  strengths  and  weaknesses  are  quite 
distinct.  Other  peer  groups,  such  as  Peer  Groups  four  and  seven,  are  much  closer  to 
the  national  averages  in  their  performance.  It  must  be  emphasized  that  comparing 
peer  group  performance  to  the  national  average  is  used  as  a  descriptive  device  and 
is  not  intended  to  suggest  that  these  are  norms  for  the  transit  industry.  Each  peer 
group  must  have  its  own  set  of  standards. 

Statistical  analyses  were  also  done  to  show  that  the  peer  groups  are 
significantly  different  on  each  performance  indicator.  Analysis  of  variance  revealed 
that  performance  does  vary  across  peer  groups  for  all  seven  performance 
indicators.  However  the  results  were  slightly  less  significant  for  revenue  generation 
and  maintenance  efficiency. 

The  performance  indicators  were  also  examined  to  see  if  they  adequately 
discriminated  between  peer  groups  both  in  terms  of  average  performance  and  the 
range  of  values.  It  was  found  that  each  peer  group  did  have  its  own  unique  range  of 
values  on  most  of  the  indicators  reflecting  important  and  practical  differences 
between  transit  systems  operating  in  different  circumstances.  Some  cautions  are 
given  for  use  of  certain  performance  indicators  because  performance  is  so  varied 
within  some  peer  groups  or  the  indicator  is  more  valid  for  certain  types  of  transit 
systems. 

The  final  analysis  in  Chapter  Four  addresses  the  question  of  how  the  four 
operating  characteristics  used  to  form  the  peer  groups  relate  to  performance. 
Multiple  correlation  analyses  and  a  comparison  of  performance  profiles  showed  that 
size  of  a  transit  system,  as  measured  by  the  number  of  peak  vehicles,  was  the  most 
important  variable  in  predicting  differences  in  performance.  However  both  speed 
and  peak  to  base  ratio  made  significant  contributions  to  accounting  for  differences 
in  performance  for  some  of  the  performance  indicators.  Labor  efficiency  was  most 
strongly  related  to  speed  and  vehicle  efficiency  was  most  strongly  related  to  the 
peak  to  base  ratio.  Revenue  generation  was  not  significantly  related  to  any  of  the 
operating  characteristics  although  together  they  predicted  a  significant  amount  of 
the  variance  between  systems. 

Use  of  the  Results 

Although  the  results  of  previous  research  are  already  in  use,  the  current 
research  will  confirm  the  validity  of  using  a  small  set  of  indicators  and  encourage 
meaningful  comparison  between  similar  systems. 
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California  is  already  requiring  only  five  indicators  based  upon  previous 
research.  Other  states  including  Florida,  Iowa,  Michigan  and  Pennsylvania  have  used 
these  same  performance  concepts  to  develop  performance  monitoring  and  reporting 
requirements.  As  a  result  of  this  research,  they  will  be  able  to  use  the  Section  15 
data  with  confidence. 

Improved  utilization  of  Section  15  data  at  the  transit  agency  level  promises 
even  more  beneficial  results.  Using  the  preliminary  results  of  this  research,  the 
Orange  County  Transit  District,  California,  the  Transit  Department  of  Seattle 
METRO  and  the  New  York  M.T.A.  have  revised  their  management  information 
systems  to  provide  quarterly  reports  representing  the  major  dimensions  of 
performance  based  upon  the  Section  15  data  format.  It  will  be  important  to  study 
these  results  in  future  years,  as  well  as  to  examine  the  consequences  for  agencies 
like  the  Washington  D.C.  Metropolitan  Area  Transit  Authority  that  use  a  much 
larger  list  of  performance  indicators. 

A  related  report  was  prepared  for  UMTA  to  assist  in  the  preparation  of  the 
report  to  Congress  on  the  status  of  the  nation's  urban  public  transportation.  Both 
FY  1980  and  FY  1981  data  were  reported  for  the  seven  performance  indicators 
identified  by  this  research.  The  results  were  reported  as  nationally  aggregated 
statistics  as  well  as  by  the  twelve  peer  groups.  These  results  will  allow  the 
Secretary  of  Transportation  to  report,  not  only  aggregate  changes  in  American 
transit,  but  also  changes  by  national  peer  group. 

Results  have  already  been  used  in  management  training.  A  full  day  is  devoted 
to  the  use  of  Section  15  data  in  analyzing  transit  performance  at  the  Transit 
Managerial  Effectiveness  Program  that  is  offered  by  the  UMTA  University  Center 
for  Transit  Research  and  Training  located  at  Irvine.  Transit  managers  become 
familiar  with  performance  analysis,  learn  statistical  concepts  and  gain  experience 
with  computers  by  using  Section  1 5  data  sets  developed  in  this  research. 

Another  result  from  the  research  has  been  the  independent  assessment  of  the 
Section  15,  federal  data  submission  requirement.  It  is  essential  that  it  be  continued 
and  the  accuracy  of  information  reported  be  improved.  It  is  also  recommended  that 
revisions  be  made  in  the  current  requirements  and  that  more  data  be  requested  on 
the  operating  environment  of  each  agency. 
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CHAPTER  1 
INTRODUCTION  AND  DATA  ASSESSMENT 


INTRODUCTION 

Research  on  transit  perfornnance  is  innportant  for  policy  analysis.  It  allows 
federal,  state  and  local  agencies  to  outline  objectives  for  transit,  to  experiment 
with  new  forms  of  service  and  to  monitor  performance.  It  can  also  assist  transit 
managers:  using  a  small  set  of  performance  indicators,  they  can  reliably  track 
overall  performance  of  an  agency  and  evaluate  one  agency  against  others  with 
similar  operating  characteristics. 

Evaluation  is  neglected  in  public  policy  research.  A  great  deal  of  attention  is 
devoted  to  the  formulation  of  government  programs  and  their  implementation. 
However,  few  researchers  have  been  concerned  with  measurement  and  analysis  of 
programs  once  they  have  been  implemented.  Elinor  Ostrom's  research  on  police 
services  is  an  exception.^  The  present  report  expands  evaluation  research  by 
defining  measurement  and  analysis  structures  for  American  public  transit. 

The  goal  has  been  to  provide  a  systematic  procedure  for  judging  the  merits  of 
public  transit  programs.  This  is  accomplished  through  constructing  a  model  of 
transit  performance  and  showing  that  a  small  set  of  seven  indicators  can  reliably 
represent  the  major  dimensions  of  transit  performance.  And  further,  that  these 
indicators  can  be  used  to  evaluate  individual  systems  within  "peer  groups"  of  similar 
systems  defined  by  inherent  operating  characteristics. 

Judging  the  merits  of  public  transit  programs  is  seldom  value-free.   But,  by 

using  the  model  outlined  by  this  research,  analysts  can  be  assured  that  they  are 

encompassing  the  major  dimensions.  Experimentation  with  different  kinds  of  transit 

provision  and  with  new  methods  for  satisfying  demand  is  necessary  in  an  industry 

that  has  grown  rapidly  during  the  past  two  decades  while  being  responsive  to 

2 

changing  policy  objectives.     Costs  per  vehicle  hour  have  been  rising  faster  than 


^  Elinor  Ostrom  et  al..  Community  organization  and  the  provision  of  police 
services.  (Beverly  Hills,  Calif.:  Sage  Professional  Papers  in  Administrative  and 
Policy  Studies,  1973.) 

^Gordon  J.  Fielding,  Changing  objectives  for  American  transit  (Parts  I  and  II), 
Transport  Reviews.  1983.  3  (3  and  4),  287-299,  341-362. 
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inflation,  employee  productivity  has  declined  and  passengers  per  revenue  vehicle 
hour  have  remained  static. 

Policy  and  evaluation  objectives  for  public  transit  are  similar  to  those  in  other 
social  services:  to  determine  whether  service  is  being  produced  efficiently  and  used 
effectively.  As  Alice  M.  Rivlin  suggests  in  Systematic  Thinking  for  Social  Action: 
"Unless  we  begin  searching  for  improvements  and  experimenting  with  them  in  a 
systematic  way,  it  is  hard  to  see  how  we  will  make  much  progress  in  increasing  the 
effectiveness  of  our  social  services.**^ 

Current  State  Use  of  Section  1 5  Data 

A  number  of  states  have  already  undertaken  the  task  of  identifying  the 
important  dimensions  of  transit  performance.  Site  visits  and  questionnaire  surveys 
were  used  in  Florida  to  define  eight  operational  goals.  Quantitative  measures 
derived  from  the  Section  15  data  base  were  then  identified  for  each  of  the  goals, 
and  standard  value  ranges  established.  .  Special  software  was  developed  in  Iowa  so 
that  both  urban  and  rural  transit  systems  could  easily  be  compared  using  the 
categories  defined  by  the  UMTA  Section  15  requirements.  The  software  system  was 
tested  and  then  used  to  develop  performance  standards  for  use  in  performance 
audits.^  Michigan  has  also  developed  an  evaluation  system  using  Section  15  data  to 
promote  the  efficient  and  effective  use  of  state  funding  for  transit. 

While  all  of  these  state  applications  demonstrate  the  usefulness  of  Section  15 
data  for  performance  evaluation,  much  less  effort  has  been  spent  to  develop  a 
nationwide  system  of  performance  evaluation.  Within-state  analyses  suffer  because 


^Alice  M.  Rivlin,  Systematic  thinking  for  social  action.  (Washington:  The 
Brookings  Institution,  1971),  p.  119. 

^Post,  Buckley,  Schuh  and  Jerrigan,  Inc.,  Florida  transit  system  performance 
measures  and  standards.  (Tallahasse,  FL:  Florida  Department  of  Transportation, 
Public  Transportation  Operations  Division,  1979.) 

^lowa  Department  of  Transportation,  Uniform  data  management  system: 
System  development  and  testing.  Report  No.  DOT-I-81-2.  (Ames,  Iowa:  Iowa 
Department  of  Transportation,  October  1980.) 

^James  M.  Holec,  Dianne  S.  Schwager,  and  Angel  Fandalian,  Use  of  federal 
Section  15  data  in  transit  performance:  Michigan  program.  Transportation 
Research  Record  No.  765.  (Washington,  D.C.:  Transportation  Research  Board, 
1980).  pp.  36-38. 
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of  the  relatively  few  number  of  transit  systems  in  any  one  state.  With  a  larger 
number  of  transit  systems,  it  is  possible  to  better  establish  the  reliability  and 
internal  validity  of  specific  performance  indicators. 

Finding  a  set  of  "peer"  transit  systems  in  an  intra-state  comparative  analysis  is 
also  difficult.  Few  states  have  more  than  a  few  large  urban  transit  systems  and 
some  states  have  only  a  limited  number  of  transit  providers  of  any  size.  The 
establishment  of  peer  groups  across  state  boundaries  allows  for  the  construction  of 
reasonably  sized  peer  groups  while  controlling  for  the  large  variation  in  operating 
environments.  Current  within-state  peer  groups  typically  vary  in  limited  ways — by 
size,  mode  or  urban-rural  location.  A  set  of  peer  groups  for  the  nation  can  capture 
finer  distinctions  in  other  aspects  of  the  operating  environment. 

The  research  described  in  this  report  updates  previous  work  on  a  nationwide 
approach  to  performance  evaluation. 

Links  to  Previous  Research 

The  first  year  of  Section   15  reported  statistics  (FY  1979)    was  used  by 

Q 

Anderson  and  Fielding    to  test  the  performance  concept  model  developed  by 

9 

Fielding,  Glauthier,  and  Lave.  Nine  dimensions  of  performance,  developed  from  60 
measures,  were  used  to  develop  a  performance  index  which  could  be  applied  to 
individual  transit  properties.  Although  the  results  of  this  previous  research  have 
been  widely  used,  the  Principal  Investigator  was  concerned  over  the  validity  of  the 
results  because  of  the  limitations  of  the  inaugural  data  from  the  Section  15 


'U.S.  Department  of  Transportation,  Transportation  Systems  Center,  National 
urban  mass  transportation  statistics:  First  annual  report  Section  15  reporting 
system:  Transit  financial  and  operating  data  reported  for  fiscal  years  ending 
between  July  1.  1978  and  June  30.  1979.  Report  No.  UMTA-MA-06-0107-81-1. 
(Washington,  D.C.:  U.S.  Government  Printing  Office,  May  1981.) 

^Shirley  C.  Anderson  and  Gordon  J.  Fielding,  Comparative  analysis  of  transit 
performance.  Final  report  No.  UMTA-CA-1 1-0020-1.  (Irvine,  Calif.:  University  of 
California,  Institute  of  Transportation  Studies,  January  1982.)  (NTIS  No.  PB 
82-196478). 

^Gordon  J.  Fielding,  Roy  E.  Glauthier,  and  Charles  A.  Lave,  Development  of 
performance  indicators  for  transit.  Final  report  No.  UMTA-CA-1 1-0014-78-1. 
(Irvine,  Calif.:  University  of  California,  Institute  of  Transportation  Studies, 
December  1977.)  (NTIS  No.  PB  278  678). 
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requirements.  The  Transportation  Systenns  Center  (TSC)  conripiled  the  first  annual 

report  based  on  FY  1979  data  but  warned  that  "care  should  be  taken  in  the 

10 

application  and  use  of  the  data  as  presented."  Although  there  was  extensive 
checking  and  editing,  reporting  deficiencies  and  erroneous  data  rennained  in  the  FY 
1979  data  tape  supplied  to  the  UCI  research  team.  Omission  of  important  data  was 
common  and  in  other  instances  obviously  erroneous  data  remained  uncorrected. 
Where  possible  the  errors  were  corrected  or  entries  deleted  so  that  the  data  set 
(UCI  data  set)  used  for  the  research  differed  from  the  TSC  data  set.  Replication  of 
the  results  published  in  1982  was  a  principal  reason  for  conducting  the  current  study. 

The  second  year  data  (FY  1980)  became  available  in  July  1982  and  was  both 
more  complete  and  more  carefully  verified  by  TSC.^^  Therefore,  research  was 
proposed  which  would: 

1.  Use  the  FY  1980  data  to  Lest  the  validity  of  the  set  of  performance 
indicators  developed  from  the  FY  1979  data  for  fixed-route  bus  operators, 
and 

2.  Define  relatively  homogeneous  groups  of  operators  (peer  groups)  that 
could  be  compared  in  terms  of  performance. 

Agencies  operating  304  bus  systems  were  included  in  the  study.  Rail  operations 
were  excluded.  This  includes  the  exclusive  operators  like  the  Bay  Area  Rapid 
Transit  District  and  rail  operation  statistics  which  are  reported  by  mixed-mode 
operators  like  the  Chicago  Transit  Authority.  Bus  operating  statistics  for 
mixed-mode  operators  were  included.  Exclusive,  demand-responsive  operators  were 
excluded. 

Chapters  in  this  report  respond  to  the  two  objectives.  The  latter  portion  of  this 
chapter  reviews  the  data  and  the  methods  used  to  correct  problems  and  to  reformat 
data  for  statistical  analysis.    Chapter  2  reports  the  analysis  of  a  large  set  of 


'•^U.S.  Department  of  Transportation,  Transportation  Systems  Center, 
National  urban  mass  transportation  statistics:  First  annual  report  Section  15 
reporting  system:  Transit  financial  and  operating  data  reported  for  fiscal  years 
ending  between  July  1.  1978  and  June  30.  1979.  Report  No.  UMTA-MA-06-0107- 
81-1.  (Washington,  D.C.:  U.S.  Government  Printing  Office,  May  1981),  p.  vi. 

'^TSC  supplied  the  data  as  a  magnetic  tape  divided  into  62  data  files. 
Although  this  same  data  was  used  to  prepare  the  Second  Annual  Report  of  National 
urban  mass  transportation  statistics,  1982,  the  format  is  quite  different.  This  is 
discussed  later  in  this  chapter. 
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performance  variables  in  conjunction  with  factor  analysis  to  establish  seven 
dimensions  of  transit  performance.  The  chapter  also  reports  tests  of  the  validity  of 
using  a  small  set  of  indicators  to  represent  these  dimensions.  Seven  marker 
indicators  were  chosen  rather  than  the  nine  proposed  in  the  1982  report.  Hence  this 
chapter  revises  the  previous  results. 

Chapter  3  describes  the  creation  of  a  typology  for  transit  systems  using 
characteristics  of  transit  operations  that  are  available  in  the  Section  15  statistics. 
Agency  size,  peak-to-base  ratio  and  average  bus  speed  are  used  to  create  12  peer 
groups.  This  new  typology  updates  and  supercedes  the  typology  briefly  described  in 
1982.  Chapter  4  describes  the  performance  profile  of  each  group  and  analyzes  the 
relationships  between  all  12  peer  groups  and  the  seven  performance  indicators. 

Chapters  1-5  are  somewhat  technical.  Chapter  4  summarizes  the  achievements 
of  this  research  in  less  technical  terms. 

Use  of  the  Results 

Although  the  results  of  previous  research  are  already  in  use,  the  current 

research  will  confirm  the  validity  of  using  a  small  set  of  indicators  and  encourage 

meaningful  comparison  between  similar  systems. 

California  is  already  requiring  only  five  indicators  based  upon  the  1977 
12 

research.  Other  states  including  Florida,  Iowa,  Michigan  and  Pennsylvania  have 
used  these  same  performance  concepts  to  develop  performance  monitoring  and 
reporting  requirements.  ^  As  a  result  of  this  research,  they  will  be  able  to  use  the 
Section  15  data  with  confidence,  and  change  the  weights  of  the  dimensions  to 
emphasize  either  efficiency  or  effectiveness  attributes. 

Improved  utilization  of  Section  15  data  at  the  individual  transit  property  level 
promises  even  more  beneficial  results.  Using  the  preliminary  results  of  this 
research,    the   Orange   County    Transit   District,    California,   and   the  Transit 


^^California.  Business  and  Transportation  Agency.  Transportation 
Development  Act:  Statutes  as  amended  and  related  sections  of  the  California 
Administrative  Code  as  adopted  by  the  Secretary  of  the  Business  and  Transportation 
Agency.  Report  No.  DMT-032.  (Sacramento,  Calif.:  California  Department  of 
Transportation,  Division  of  Mass  Transportation,  February  1978.) 

^^Oames  H.  Miller,  The  use  of  performance-based  methodologies  for  the 
allocation  of  transit  operating  funds,  Traffic  Quarterly.  October  1980,  3A(4). 
555-585. 
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Department  of  Seattle  METRO  have  revised  their  management  information  systems 

to  provide  monthly  and  quarterly  reports  representing  the  major  dimensions  of 

performance  based  upon  the  Section  15  data  format.  It  will  be  important  to  study 

these  results  in  future  years,  as  well  as  to  examine  the  consequences  for  agencies 

like  the  Washington  D.C.  Metropolitan  Area  Transit  Authority  that  use  a  much 

larger  list  of  performance  indicators. 

14 

A  related  report  was  prepared  for  UMTA  to  assist  in  the  preparation  of  the 
report  to  Congress  on  the  status  of  the  nation's  urban  public  transportation.  Both 
FY  1980  and  FY  1981  data  were  reported  for  the  seven  performance  indicators 
identified  by  this  research.  The  results  were  reported  as  nationally  aggregated 
statistics  as  well  as  by  the  12  peer  groups.  These  results  will  allow  the  Secretary  of 
Transportation  to  report,  not  only  aggregate  changes  in  American  transit,  but  also 
changes  by  national  peer  group.  As  more  years  of  Section  15  data  become  available, 
it  will  be  possible  to  do  more  longitudinal  studies  of  changes  in  different  types  of 
transit  providers  as  represented  by  the  peer  groups. 

Results  have  already  been  used  in  management  training.  A  full  day  is  devoted 
to  the  use  of  Section  15  data  in  analyzing  transit  performance  at  the  Transit 
Managerial  Effectiveness  Program  that  is  offered  by  the  UMTA  University  Center 
for  Transit  Research  and  Training  located  at  Irvine.  Transit  managers  become 
familiar  with  performance  analysis,  learn  statistical  concepts  and  gain  experience 
with  computers  by  using  Section  1 5  data  sets  developed  in  this  research. 

Another  result  from  the  research  has  been  the  independent  assessment  of  the 
Section  15  federal  data  submission  requirement.  The  research  results  demonstrate 
the  usefulness  of  the  requirement.  It  is  essential  that  it  be  continued  and  accuracy 
improved.  It  is  also  recommended  that  revisions  be  made  in  the  current 
requirements  and  that  more  data  be  requested  on  the  operating  environment  for 
each  property.  Suggestions  have  been  made  by  the  research  team  to  the  UMTA 
Section  15  Reporting  System  Advisory  Committee  appointed  by  the  U.S.  Secretary 
of  Transportation  in  1983  on  sampling  of  passenger  statistics,  deletion  of  "road  call" 
data  and  improved  definition  for  accidents  statistics. 


*  ^Gordon  J.  Fielding  and  Katherine  Faust,  Dimensions  of  bus  performance  for 
peer  groups  of  transit  agencies  in  fiscal  years  1980  and  1981  using  Section  15  data. 
Working  Paper  No.  83-5.  (Irvine,  Calif.:  University  of  California,  Institute  of 
Transportation  Studies,  December  1983.) 
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DATA  AVAILABILITY 

Statistics  reported  in  compliance  with  Section  15  of  the  Urban  Mass 
Transportation  Act  of  1964,  as  amended,  were  used  as  the  source  for  all  data  used 
for  analysis  since  the  purpose  of  this  project  was  to  develop  a  system  of  transit 
evaluation  that  would  be  applicable  and  available  throughout  the  United  States. 
Data  for  fiscal  year  1980  (the  second  year  of  data)  were  obtained  from  the 
Transportation  Systems  Center  (TSC),  Cambridge,  Massachusetts.  The  data  were 
supplied  in  the  form  of  a  magnetic  tape  divided  into  62  data  files,  roughly 
corresponding  to  the  reporting  forms  filled  out  by  transit  systems.  The  data  tape 
differs  from  the  published  version  of  the  statistics^  ^  in  that  it  is  more 
comprehensive  and  organized  in  a  more  complex  way. 

Although  the  Section  15  tape  is  the  most  extensive  and  uniform  set  of  data 
available  for  transit  at  the  current  time,  there  are  major  problems  which  must  be 
overcome.  The  magnetic  tape  must  be  substantially  reorganized  before  it  can  be 
used  with  any  of  the  major  statistical  software  packages.  The  tape  has  a  complex 
organization  because  it  includes  information  from  four  different  reporting  levels  (R, 
A,  B,  C)  and  for  seven  different  modes  of  transit.  Any  single  transit  system  will  be 
reporting  at  only  one  reporting  level  and  for  some  sub-set  of  modes.  Further,  some 
individual  items  on  the  forms  are  irrelevant  to  a  specific  transit  system.  Rather 
than  leaving  blank  space  for  the  irrelevant  items  or  coding  them  as  missing 
information,  TSC  has  employed  a  hierarchical  coding  scheme  which  allows  for  the 
economical  and  methodical  coding  of  different  sub-sets  of  information  for  each 
transit  system.  The  following  section  of  this  chapter  examines  this  problem  in 
detail  and  demonstrates  how  the  hierarchical  structure  was  converted  to  a 
statistical  structure.  The  data  themselves  must  be  carefully  scrutinized  for  validity 
and  reliability.^^  Since  few  researchers  have  worked  with  the  Section  15  data  tape, 
the  data  reorganization  and  validation  procedures  are  described  in  some  detail 


^^U.S.     Department     of     Transportation,     Urban     Mass  Transportation 

Administration,  National  Urban  Mass  Transportation  Statistics:  Second  Annual 

Report,  Section  15  Reporting  System.  (Washington,  D.C.:  U.S.  Department  of 
Transportation,  July  1982.) 

^ ^Beginning  with  the  FY  1981  data,  TSC  started  to  validate  the  data  with 
methods  similar  to  the  ones  reported  here.  Thus  the  quality  of  data  provided  will 
improve  with  each  successive  year.  However  the  same  organizaton  of  the  data  tape 
will  be  used  at  least  until  FY  1982. 
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in  this  report.   Further  information  can  be  obtained  from  the  technical  report 

completed  as  part  of  the  grant. 

Certain  kinds  of  information  are  not  available  from  Section  15  data.  Detailed 

information  on  the  service  areas  of  specific  transit  systems  is  not  included.  Thus  it 

is  not  possible  to  examine  how  well  individual  transit  systems  serve  specific 

sub-populations  (e.g.,  elderly,  low  income,  transit  dependent)  or  how  geographical 

features  of  service  areas  (urban  density,  mean  temperatures)  affect  service  demand 

and  cost.  Organizational  features  of  transit  systems  such  as  whether  the  labor  force 

is  unionized,  the  ownership  is  public  or  the  primary  service  is  commuter  oriented  are 

18 

also  unavailable.  An  earlier  project     tried  to  use  published  sources  of  information 

to  supplement  the  Section  15  data  but  found  that  these  sources  seldom  had  units  of 

analysis  that  were  comparable  to  the  service  areas  of  transit  systems. 

It  is  possible  to  combine  Section  15  data  with  census  data  by  examining  an 

urban  area  as  defined  by  census  standards  (e.g.,  SMS  A)  as  the  unit  of  analysis  and 

aggregating  the  information  for  all  transit  systems  that  serve  that  urban  area. 

19 

Vaziri  and  Deacon  have  demonstrated  how  performance  evaluation  can  be  done  in 
this  way.  However,  this  project  wished  to  provide  results  that  would  be  useful  to 
the  managers  of  individual  transit  systems.  So  an  urban  area  analysis  would  be 
relevant  to  only  a  few  large  transit  systems  whose  service  area  corresponded  to  the 
urban  area.  It  was  decided  to  work  with  individual  transit  systems  using  primarily 
Section  15  data.  These  data  are  most  readily  available  to  transit  managers,  not  only 
in  magnetic  tape  form  but  in  published  form  and  on  diskettes  for  microcomputers. 

Preparation  of  the  data  was  done  in  three  major  phases — reorganization  of  the 
data  into  a  format  suitable  for  statistical  analysis,  calculation  and  validation  of  data 
values,  and  evaluation  of  the  quality  and  properties  of  specific  variables. 


^  ^Gordon  J.  Fielding,  Mary  E.  Brenner,  and  Olivia  de  la  Rocha,  Using  Section 
15  data  for  transit  performance  analysis.  Interim  report  No.  UMTA-CA- 
11-0026-1.  (Irvine,  Calif.:  University  of  California,  Institute  of  Transportation 
Studies,  January  1983.) 

^ Shirley  C.  Anderson  and  Gordon  3.  Fielding,  Comparative  analysis  of  transit 
performance.  Final  report  No.  UMTA-CA-1 1-0020-82-1.  (Irvine,  Calif.:  University 
of  California,  Institute  of  Transportation  Studies,  January  1982.) 

^'^Manoucher  Vaziri  and  John  A.  Deacon,  Application  of  Section  15  and  census 
data  to  transit  decision  making.  Final  report  No.  UMTA-KY-1 1-0002-83. 
(Springfield,  Va.:  National  Technical  Information  Service,  1983.) 
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DATA  REORGANIZATION 

One  of  the  most  important  problems  to  be  overcome  in  working  with  the 
Section  15  tape  was  reorganizing  the  data  into  a  form  suitable  for  statistical 
analysis.  In  the  following  sections  the  circumstances  making  reorganization  of  the 
files  necessary  and  the  steps  required  to  accomplish  the  reorganization  are 
explained.  After  a  brief  discussion  of  the  problems  connected  with  defining 
electronic  data  for  statistical  analysis,  it  will  be  shown  how  the  tape  files  diverge 
from  a  conventional  statistical  format.  The  discussion  will  then  focus  on  the 
reorganization  process  both  conceptually  and  from  the  point  of  view  of 
programming.  It  will  conclude  with  remarks  on  other  data  processing  problems 
encountered  in  managing  the  tape  and  a  summary  of  the  files  and  variables 
reorganized. 

Defining  Data  for  Statistical  Analysis 

Several  background  concepts  are  useful  in  understanding  the  nature  of  the 
organizational  problems.  While  all  the  numbers  in  a  data  file  are  organized  in  rows 
and  columns,  the  meaning  of  the  numbers  is  not  inherent  in  the  row  and  column 
organization.  It  must  be  conveyed  to  the  computer  by  the  programmer.  The  system 
or  scheme  used  by  the  programmer  to  give  meaning  to  the  array  of  numbers  is  the 
logical  organization. 

The  specification  of  the  logical  organization  is  laid  out  in  a  document  called  a 
codebook.  In  a  codebook  the  meaning  of  data  is  defined  by  the  way  the  numbers  are 
organized  into  sets  of  columns.  A  large  number  like  $4,000,000  takes  up  7  columns, 
for  example.  The  assigned  sets  of  columns  are  called  fields,  and  each  unique  set  of 
information  items  filling  up  the  fields  is  called  a  record. 

Table  1-1  shows  a  codebook  from  TSC's  documentation.  According  to  the 
codebook,  columns  one  through  four  of  the  number  array  have  been  reserved  for 
Transit  System  ID.  Columns  five  through  twelve  are  reserved  for  the  fiscal  year  end 
date  for  the  system  which  is  identified  in  columns  one  through  four.  Column 
thirteen  is  assigned  to  the  mode  code.  And  so  it  goes.  With  the  help  of  this  scheme 
the  computer  can  be  informed  about  the  meaning  of  the  data  by  the  way  fields  in 
the  block  of  numbers  are  assigned.  This  process  is  called  formatting. 

By  formatting,  fields  are  named  so  that  any  number  found  in  that  space  by  the 
computer  can  be  presumed  to  have  the  assigned  meaning.  The  computer  can  then 
interpret  each  record  it  encounters  in  the  data  array  by  the  same  standard.  There  is 
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TABLE  1-1.  A  SAMPLE  CODEBOOK 


COLUMN 

NAME 

TYPE 

DESCRIPTION 

1  -  A 

TRSID 

INTEGER 

TRANSIT  SYSTEM  ID 

5-12 

FY 

DATE 

FISCAL  YEAR 

13  -  13 

MODE 

INTEGER 

MODE  CODE 

14  -  15 

EMCOD 

INTEGER 

EMPLOYEE  CLASS  CODE 

16  -  21 

OLABR 

REAL 

OPERATING  LABOR 

22  -  27 

CLABR 

REAL 

CAPITAL  LABOR 

some  flexibility  in  the  way  data  may  be  formatted,  and  there  may  be  more  than  one, 
meaningful  logical  organization  for  the  same  data  file. 

Two  additional  concepts  fill  out  the  data  definition  problem  for  statistical 
anaysis.  Statistical  procedures  operate  by  making  systematic  comparisons  among 
objects.  The  objects  are  compared  on  those  attributes  which  have  been  measured  in 
some  way.  For  example,  in  later  analyses  transit  systems  are  compared  on  such 
attributes  as  size  of  fleet  and  speed. 

In  statistical  data  files  the  most  important  organizational  units  are  cases 
(objects)  and  variables  (attributes).  A  case  may  be  thought  of  as  the  full  collection 
of  information  items  defined  in  the  codebook  for  a  single  transit  agency.  If  some 
defined  item  is  missing,  the  statistical  case  is  incomplete,  and  a  place-holding  code 
must  be  inserted  to  fill  it  out. 

A  variable,  like  a  case,  is  a  statistical  concept.  When  all  cases  have  been 
measured  on  a  given  attribute,  the  resulting  collection  of  values  is  organized  in  a 
list  called  a  variable.  Statistical  procedures  compare  these  lists  and  depend  on  the 
fact  that  cases  always  appear  in  the  same  order.  Once  again  if  no  place-holder 
resides  in  the  position  of  a  missing  item,  the  order  is  disturbed  and  statistical  results 
are  rendered  meaningless. 

20  21 

In  formatting  data  to  be  read  by  a  statistical  package  like  SPSS     or  BMDP, 
the   electronic   definition   of   cases  and   variables  depends  on  there  being  a 


^'^Norman  H.  Nie,  C.  Hadlai  Hull,  Jean  G.  Jenkins,  Karin  Steinbrenner  and  Dale 
H.  Bent,  SPSS:  Statistical  package  for  the  social  sciences.  (New  York: 
McGraw-Hill,  Inc.,  1975.) 

21w.  J.  Dixon,  Ed.,  BMDP  Statistical  software  1981.  (Los  Angeles:  University 
of  California  Press,  1981.) 
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uniform  amount  of  information  for  every  case  and  every  variable.  Each  case  must 
have  values  or  stand-in  values  for  every  variable,  and  every  variable  must  have 
values  or  stand-in  values  for  every  case. 

The  Divergence  of  Tape  Files  from  Conventional  Statistical  Format 

The  circumstances  leading  to  the  need  to  reorganize  the  tape  files  arise  in  the 
way  the  structure  of  the  files  is  closely  linked  to  the  reporting  forms.  This  close 
linkage  leads  to  tv^o  problems  in  formatting  the  data  files  for  statistical  purposes. 
First,  information  vk^hose  presence  was  predicted  by  the  reporting  forms  was  absent 
in  the  actual  data  requiring  the  insertion  of  place-holding  values.  Second, 
preparation  of  the  data  in  the  same  format  as  the  reporting  forms  required  the 
design  of  a  new  codebook  before  the  data  could  be  read  for  statistical  purposes. 
Form  404,  Transit  System  Employee  Count  Schedule,  can  be  used  to  illustrate  how 
both  problems  arise. 

Missing  records.  Figure  1-1  shows  Form  ADA  and  the  information  submitted  by 
one  transit  system  as  it  is  recorded  in  a  data  file  on  the  tape.  A  comparison  of  the 
form  and  the  data  shows  the  first  three  fields  TRANSIT  SYSTEM  ID,  FISCAL  YEAR 
ENDED  and  MODE  coming  from  the  top  of  the  form  and  repeating  on  every  record 
in  the  data.  The  next  two  fields,  EMPLOYEE  CLASSIFICATION  (EC)  and 
OPERATING  LABOR  (OLABR)  are  taken  from  the  "Employee  Classification"  and 
"Operating  Labor"  sections  of  the  form.  (Information  about  capital  labor  is  omitted 
from  the  example.)  The  Figure  shows  a  one-to-one  correspondence  between  the 
numbers  assigned  to  employee  categories  on  the  form  (11,  12,  13,  etc.)  and  the 
values  under  EC  in  the  data.  However,  the  one-to-one  correspondence  is  not  quite 
complete.  If  Form  A04  were  used  to  construct  a  codebook  which  acted  as  the  logical 
organization  for  the  data  appearing  in  Figure  1-1,  then  there  would  be  a  discrepancy 
between  what  the  logical  organization  predicts  and  what  actually  appears  in  the 
data  file.  There  is  no  record  appearing  for  category  22,  Maintenance  Support 
Personnel,  in  the  data  file. 

This  circumstance  violates  the  reguirement  of  statistical  software  packages  for 
complete  information  in  all  variables  and  cases,  and  some  entry  to  stand  in  for  the 
missing  category  22  must  be  supplied  or  the  data  will  not  be  correctly  read.  Until  a 
stand-in  value  (or  dummy  record)  is  inserted,  the  information  cannot  be  said  to  form 
a  complete,  statistically  analyzable  case.  Therefore,  all  such  instances  of  "missing" 
information  had  to  be  remedied  before  statistical  analysis  could  commence. 
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FIGURE  1-1.     CORRESPONDENCE  BETWEEN  REPORTING  SYSTEM  FORMS 
AND  THE  ORGANIZATION  OF  TSC  DATA  FILES 

Form  No.  404 

TRANSIT  SYSTEM  EMPLOYEE  COUNT  SCHEDULE 

Transit  System  ID     |  1056  j  Level  [T] 

Fiscal  Year  Ended     [W]  pOl  |W1  Mode   motorbus  Code  [T] 


EMPLOYEE  CLASSIFICATION 

11. 

Transportation  Executive,  Professional  and 

Supervisory  Personnel 

1  4.5  1 

12. 

Transportation  Support  Personnel 

1  ^6 

13. 

Revenue  Vehicle  Operators 

147.8  1 

21. 

Maintenance  Executive,  Professional  and 

Supervisory  Personnel 

1   ^.3  1 

22. 

Maintenance  Support  Personnel 

23. 

Revenue  Vehicle  Maintenance  Mechanics 

1  1 

24. 

Other  Maintenance  Mechanics 

1   .s  i 

25. 

Vehicle  Servicing  Personnel 

1  2.6  i 

31. 

General  Administration  Executive,  Professional 

and  Supervisory  Personnel 

1  1.0  1 

32. 

General  Administration  Support  Personnel 

1   2.3  1 

00. 

TOTAL  TRANSIT  SYSTEM  EMPLOYEES 

I6M  1 

ID     FY  M  EC  OUBR 

1056  19800630  1  11  4.5000 

1056  19800630  1  12  2.5000 

1056  19800630  1  13  47.800 

1056  19800630  1  21  2.3000 

1056  19800630  1  23  5.6000 

1056  19800630  1  24  .50000 

1056  19800630  1  25  2.6000 

1056  19800630  1  31  1.0000 

1056  19800630  1  32  2.3000 

1056  19800630  1  00  67.100 


ID=ID  NUMBER 
FY=FISCAL  YR  END  DATE 
M=M0DE 

EC=EMPLOYEE  CODE 
OLABR=OPERATING  LABOR 
(CAPITAL  LABOR 
VALUES  OMIHED) 
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Designing  a  new  codebook.  A  second  innportant  conseguence  of  the 
correspondence  between  the  data  and  the  fornns  is  the  way  values  are  compared, 
i.e.,  which  values  are  nnaking  up  the  variables.  Again  Figure  1-1  is  used  to 
illustrate.  In  a  statistical  routine,  the  OLABR  value  of  A. 5000  cannot  be  compared 
to  the  OLABR  value  of  2.5000  beneath  it,  as  would  usually  be  the  case  in  a  file 
ordered  for  statistical  analysis.  Instead,  the  A. 5000  must  be  compared  to  another 
value,  not  shown  in  Figure  1-1,  which  has  an  EC  of  11,  but  a  different  ID  number. 
OLABR,  therefore,  is  not  one  variable  but  eleven  variables  (the  number  of  employee 
classifications)  collected  together  in  one  field.  Without  some  new  way  of  defining 
the  data,  the  statistical  routine  would  compare  the  number  of  Revenue  Vehicle 
Operators  to  the  number  of  Vehicle  Servicing  Personnel  in  the  same  system,  when 
the  need  is  to  compare  the  number  of  one  system's  Revenue  Vehicle  Operators  to 
the  number  of  Revenue  Vehicle  Operators  employed  by  another  system. 

Informing  the  computer  of  this  relationship  between  the  values  in  the  OLABR 
field  reguires  devising  a  new  logical  organization  or  codebook  for  the  data  to 
replace  that  found  in  the  TSC  documentation.  Figure  1-2  shows  most  of  the  TSC 
codebook  for  Form  AOA  in  its  original  and  revised  forms.  For  statistical  purposes, 
OLABR  in  Codebook  I  is  too  general  a  category  to  gualify  as  a  variable.  Instead  the 
eleven  variables  embedded  in  the  OLABR  field  reguire  the  new  definition  given 
them  in  Codebook  II.  Additional  comparison  of  the  two  codebooks  uncovers  another 
important  difference.  Codebook  I  "reads"  or  formats  only  one  line  of  Form  AOA  at  a 
time,  and  in  that  sense  it  has  no  inbuilt  way  of  defining  a  statistical  case.  No  higher 
level  of  organization  clustering  records  together  to  form  a  case  exists.  Codebook  II, 
on  the  other  hand,  reads  the  whole  form,  defines  the  individual  lines  on  the  form  as 
variables,  and  the  whole  form  as  a  statistical  case.  Eleven  separate  records  under 
the  old  scheme  are  clustered  together  as  a  case  in  the  new  one. 

The  kind  of  organization  found  in  the  TSC  tape  files  is  common,  economical, 
and  often  used  as  input  to  management  information  systems  using  customized 
software.  File  organization  of  this  kind  is  referred  to  as  hierarchical  ordering  by 
computer  scientists. 

By  way  of  summary,  then,  two  major  problems  motivated  the  data 
reorganization:  (1)  the  absence  of  stand-in  values  for  missing  records;  and  (2) 
hierarchical  ordering  of  data.  In  the  following  section  the  strategy  used  for  solving 
these  problems  is  described. 
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FIGURE  1-2.  A  COMPARISON  OF  ORIGINAL 
AND  REVISED  CODEBOOKS  USED  IN  FORMATTING  TAPE  FILES 


CODEBOOK  I.  UNREVISED  TSC  DOCUMENTATION 


COLUMN 

NAME 

TYPE 

DESCRIPTION 

1  -  A 

TRSID 

INTEGER 

TRANSIT  SYSTEM  ID 

5-12 

FY 

DATE 

FISCAL  YEAR 

13  -  13 

MODE 

INTEGER 

MODE  CODE 

14  -  15 

EMCOD 

INTEGER 

EMPLOYEE  CLASS  CODE 

16  -  21 

OLABR 

REAL 

OPERATING  LABOR 

CODEBOOK  II.  REVISED  TSC  DOCUMENTATION 


CARD 

COLUMN 

NAME 

DESCRIPTION 

1 

1  - 

TRSIDl 

TRANSIT  SYSEM  ID  FOR  CARD  1 

5  - 

12 

FY 

FISCAL  YEAR 

13  - 

13 

MODE 

MODE  CODE 

H  - 

15 

EMCOD 

EMPLOYEE  CLASS  CODE 

16  - 

21 

TNSEXOL 

TRANS.  EXEC,  PROF.,  AND  SUPP.  OP  LABR. 

2 

1  - 

h 

TRSID2 

TRANSIT  SYSTEM  ID  FOR  CARD  2 

5  - 

15 

OMITTED 

16  - 

21 

TNSSPOL 

TRANS  SUPP  PERSONNEL  OP  LABOR 

3 

1  - 

k 

TRSID3 

TRANSIT  SYSTEM  ID  FOR  CARD  3 

5  - 

15 

OMITTED 

• 

16  - 

21 

RVEHOPOL 

REVENUE  VEHICLE  OPERATORS  OP  LABOR 

• 
• 

11 

• 

1  - 

A 

TRSIDl  1 

• 
• 

TRANSIT  SYSTEM  ID  CARD  1 1 

5  - 

15 

OMITTED 

16  - 

21 

TOTEMPOL 

TOTAL  TRANS  SYS  EMPLOYEES  OP  LABOR 
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Implementing  Reorganization 

The  main  goals  of  reorganization  were  to  supply  stand-in  values  for  missing 
records  and  to  reformat  instances  of  hierarchical  ordering,  i.e..  where  several 
variables  had  been  grouped  together  in  one  field.  A  hypothetical  example  of  the 
transformations  resulting  from  reorganizing  is  shown  in  Figure  1-3. 

Data  File  I  illustrates  how  the  problems  discussed  above  appear  in  the  data.  In 
Data  File  I  under  the  field  SYSTEM  ID,  there  is  no  information  present  for  system 
number  1003,  and  systems  1002  and  1004  appear  to  have  only  half  the  information 
they  need.  This  example  illustrates  that  in  the  actual  data  both  whole  and  partial 
cases  are  missing. 

The  second  problem,  hierarchical  ordering,  can  also  be  seen  in  Data  File  I.  The 
field  WAGES  contains  six  different  variables,  but  the  values  in  the  fields  MODE  and 
EMPLOYEE  CATEGORY  must  be  used  to  find  these  variables.  For  example,  the 
first  WAGES  value.  500.  has  a  MODE  value  of  1  and  an  EMPLOYEE  CATEGORY 
value  of  0.  These  values  indicate  that  the  first  500  of  WAGES  is  for  motor  bus 
drivers'  wages.  Hence,  the  only  other  value  it  can  be  compared  to  is  WAGES  of  650. 
six  lines  down  in  case  1002,  which  also  has  a  MODE  of  1  and  EMPLOYEE 
CATEGORY  of  0.  There  are  six  WAGES  variables  possible  because  in  addition  to 
the  MODE  and  EMPLOYEE  CATEGORY  combination  of  1  and  0  there  are  also  the 
combinations  of  1  and  1  or  1  and  2,  etc.  Because  there  are  two  values  of  MODE  and 
three  values  of  EMPLOYEE  CATEGORY,  it  takes  two  times  three,  or  six, 
combinations  to  exhaust  all  pairs  possible  and  identify  all  six  variables.  Because  the 
values  in  MODE  and  EMPLOYEE  CATEGORY  are  reguired  to  distinguish  among  the 
six  variables  clustered  in  the  WAGES  field,  they  are  referred  to  by  the  functional 
term  "hierarchical  ordering  variables." 

Data  File  II  in  Figure  1-3  illustrates  how  reorganization  transforms  the  data.  In 
this  file  the  six  WAGES  variables  each  have  their  own  separate  fields.  The 
information  in  MODE  and  EMPLOYEE  CATEGORY  from  File  I  has  been 
incorporated  into  the  new  logical  organization  of  File  II.  Therefore,  they  disappear 
from  File  II.  Data  File  II  also  has  full  sets  of  information  (complete  cases)  for  all 
transit  system  ID  numbers  represented,  although  missing  value  codes  of  999  had  to 
be  inserted  to  make  this  possible.  For  example,  even  though  system  1002  has  no 
trolley  buses,  stand-in  values  of  999  were  inserted  in  the  three  trolley  bus  variables 
in  this  case. 


15 


FIGURE  1-3.  HYPOTHETICAL  DATA  FILE  BEFORE 
AND  AFTER  REORGANIZATION 


DATA  FILE  I.  HIERARCHICAL  ORGANIZATION 


SYSTEM  ID 

MODE 

EMPLOYEE 
CATEGORY 

WAGES 
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1 

0 

500 

1001 

1 

1 

600 

1001 

1 

2 

600 

1001 

2 

0 

400 

1001 

2 

1 
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MODE 
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2 

2 

700 

1  =  MOTOR  BUS 

1002 

1 

0 

650 

2  =  TROLLEY  BUS 

1002 

1 

1 

600 

1002 

1 

2 

700 

EMPLOYEE  CATEGORY 

1004 

2 

0 

700 

0  =  DRIVER 

1004 

2 

1 

000 

1  =  MAINTENANCE 

1004 

2 

2 

000 

2  =  ADMINISTRATION 

DATA  FILE  II.  STATISTICAL  ORGANIZATION 
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TRBUS 

DRIVER 
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MAINT 
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TRBUS 
ADMIN 
WAGES 
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500 

600 

600 

400 

700 

700 

1002 

650 

600 

700 

999 

999 

999 

1003 

999 

999 

999 

999 

999 

999 

1004 

999 

999 

999 

700 

000 

000 

999  =  MISSING  VALUE  CODE 
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In  general,  the  basic  reorganization  steps  can  be  reduced  to  four: 

1.  Data  were  read  as  single  records  and  unwanted  information  was 
eliminated. 

2.  The  positions  in  the  retained  data  needing  stand-in  values  were  located. 

3.  The  stand-in  values  were  inserted. 

4.  The  data  were  formatted  with  a  new  logical  organization  (codebook) 
which  considered  all  the  records  belonging  to  a  single  transit  system  as  a 
statistical  case. 

Although  useful  at  a  general  level,  it  is  misleading  to  represent  the 
reorganization  process  as  four  steps.  Discussion  of  the  programming  procedures 
required  to  implement  the  reorganization  is  more  suggestive  of  the  actual  scope  of 
the  task. 

Programming  Required  for  Reorganization 

The  basic  strategy  adopted  was  to  process  one  data  file  from  the  tape  at  a 
time,  selecting  the  variables  that  would  be  required  for  the  projected  analysis, 
reorganizing  them,  and  adding  them  cumulatively  to  a  master  data  file.  Figure  1-4 
gives  a  summary  of  the  programming  procedures  required  to  reorganize  the 
information  in  a  single  data  file. 

Three  reorganizing  requirements  were  met  with  the  first  set  of  programming 
steps.  Since  only  a  subset  of  the  information  available  in  each  file  was  actually 
used,  it  was  economically  advantageous  to  eliminate  all  but  necessary  information 
from  future  processing  steps.  Hence  the  first  step  was  to  selectively  read  only 
those  records  which  were  to  be  retained.  Next,  since  succeeding  steps  depended  on 
the  transit  systems*  data  being  in  uniform  order,  the  retained  data  were  sorted  in 
ascending  numerical  order  by  transit  system  ID  number.  Finally,  since  it  was  likely 
that  each  variable  being  processed  was  missing  a  different  set  of  needed  stand-in 
values,  each  variable  and  its  accompanying  set  of  transit  system  ID  numbers  was 
written  out  to  a  separate  disk  file.  At  this  juncture.  Step  1  of  the  basic 
reorganizaton  process  is  complete. 

The  second  step  involved  the  identification  of  the  systems  who  required  the 
insertion  of  stand-in  values  or  "dummy  records"  for  the  variables  of  interest.  The 
identification  step  was  accomplished  by  comparing»the  ID  numbers  present  for  a 
variable  to  a  master  list  of  ID  numbers.  The  result  of  this  step  was  another  disk  file 
containing  the  ID  numbers  missing  for  that  variable. 
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FIGURE  1-4.  SUMMARY  OF  PROGRAMMING  PROCEDURES 
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The  insertion  of  the  stand-in  values,  the  next  nnove  in  reorganization,  required 
two  steps,  referred  to  collectively  as  a  Merge/Sort  routine.  First,  two  data  files, 
one  containing  ID  numbers  and  variable  values  and  another  containing  the  now 
identified  missing  IDs,  were  merged  together  electronically.  Then,  because  in  the 
merging  the  required  ascending  numerical  order  is  no  longer  preserved,  the  merged 
files  were  re-sorted.  When  these  two  steps  were  complete,  a  data  file  with  a  com- 
plete set  of  ID  numbers  resulted,  although  IDs  which  were  identified  by  comparison 
to  the  master  list  were  dummy  records  having  blanks  in  the  variable  value  position. 
(In  a  later  routine  a  missing  value  designation,  -9,  was  inserted  in  the  blank.) 

When  all  the  variables  of  interest  originating  in  the  same  tape  file  have  been 
processed  to  this  point,  they  are  reunited  in  a  single  data  file  by  a  collating  routine. 
This  step  results  in  a  complete,  sorted  data  set  containing  all  the  variables.  At  this 
juncture,  three  of  the  four  basic  reorganization  steps  have  been  accomplished. 

In  a  final  step,  the  complete,  sorted  dataset  is  added  to  the  master  file.  At  this 
stage  each  transit  system  has  a  uniform  number  of  records.  This  manufactured 
uniformity  is  what  allowed  the  imposition  of  the  new  logical  organization,  in 
actuality  a  new  formatting  scheme,  which  identified  for  the  computer  the  several 
variables  embedded  in  a  single  field.  When  all  the  variables  required  for  analysis 
had  been  processed  in  this  way,  the  reorganization  step  was  complete. 

Each  data  file  handled  presented  special  characteristics  which  required  special 

treatment.  Not  all  files  required  as  many  steps  as  described  while  others  required 

many  more.  Four  major  variations  in  the  data  reorganization  procedure  emerged  in 
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practice  and  are  discussed  in  detail  in  the  Technical  Report. 

The  DECsystem-10  conversion  of  SPSS  was  used  for  nearly  all  programming 
steps.  Two  FORTRAN  programs,  one  for  identifying  missing  ID  numbers  and 
another  for  collating  variable  files,  were  also  required. 

One  other  problem  of  note  arose  in  the  handling  of  the  Section  15  variables. 
This  problem  concerned  the  inability  of  single  precision  software  (such  as  the 
DECsystem-10  conversion  of  SPSS)  to  handle  field  widths  exceeding  eight  columns. 
The  eleven-column  wide  variables  found  in  the  expense  files,  for  example,  set  up  a 
variety  of  problems  and  barriers  that  had  to  be  circumvented.  SPSS  cannot  write 


^^Gordon  J.  Fielding,  Mary  E.  Brenner,  and  Olivia  de  la  Rocha,  Using  Section 
15  data  for  transit  performance  analysis.  Interim  report  No.  UMTA-CA-1 1-0026-1 . 
(Irvine,  Calif.:  University  of  California,  Institute  of  Transportation  Studies,  January 
1985.),  pp.  11-20. 
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out  a  single  numeric  field  wider  than  eight  columns  and  warns  of  distortions  in 
accuracy  when  reading  variables  exceeding  that  limit,  although  tests  have  shown 
those  distortions  to  be  minor.  Binary  and  alphanumeric  formatting  are  temporary 
remedies  to  the  problem,  but  never  solve  it.  The  use  of  double-precision  software 
would  simplify  the  handling  of  Section  15  data,  and  we  recommend  its  use  where 
possible. 

In  all,  twenty-three  separate  data  files  and  147  variables  were  prepared  by  the 
reorganization  sequence  discussed  above.  Appendix  A  summarizes  the  files  accessed 
and  variables  retrieved. 

DATA  PREPARATION 

Once  the  data  were  reorganized,  additional  data  preparation  was  required 
before  analysis  could  commence.  There  were  three  steps  to  preparing  the  data: 
calculating  basic  variables,  identifying  and  flagging  missing  information  and 
validating  existing  data. 

Calculating  Basic  Variables 

The  Section  15  database  contained  a  wealth  of  information  which  was  too 
detailed  for  our  purpose.  It  was  necessary  to  aggregate  many  small  pieces  of 
information  into  more  comprehensive  variables  which  contained  only  information 
about  the  motorbus  mode  and  which  were  applicable  to  an  entire  year's  operation. 
The  building  blocks  for  this  process  are  listed  in  Appendix  A.  The  final  sets  of 
variables  are  listed  in  Table  2-1  and  in  the  text  of  Chapter  3.  The  remainder  of  this 
section  outlines  the  major  steps  used  to  calculate  the  variables  used  in  the  analyses. 

The  information  about  transit  employees  was  summarized  into  broader 
categories.  Ten  employee  categories  are  reported  in  Section  15:  three  in  vehicle 
operations  (i.e.,  supervisors,  revenue  vehicle  operators  and  support  personnel),  five 
in  maintenance  and  two  in  general  administration.  These  ten  categories  are  further 
subdivided  into  capital  labor  and  operating  labor.  Analysis  for  this  project  required 
only  the  number  of  vehicle  operators,  the  number  of  maintenance  employees  and  the 
number  of  administrative  employees.  The  first  step  in  creating  these  variables  was 
to  add  together  operating  and  capital  employees  since  we  were  not  interested  in  this 
distinction.  At  this  point,  the  number  of  revenue  vehicle  operators  was  ready  for 
use.  The  number  of  maintenance  employees  was  calculated  by  adding  together  the 
five  categories  of  maintenance  employees.    The  number  of  administrators  was 
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calculated  by  adding  together  the  supervisory  personnel  in  vehicle  operations  and 
maintenance  to  the  two  categories  of  adnninistrative  personnel. 

Other  variables  v^hich  underwent  a  similar  aggregation  process  were  the  total 
number  of  accidents  (combining  all  categories  of  collision  and  non-collision 
accidents),  total  amounts  of  subsidies  (combining  local,  state  and  federal)  and  the 
miles  of  line  used  on  bus  routes  (combining  mixed  right-of-way  and  one-way 
directional). 

The  number  of  peak  and  mid-day  vehicle  variables  were  created  from  several 
sources.  Although  this  information  can  be  directly  reported  on  Form  A06,  transit 
systems  with  a  peak  to  base  ratio  of  one  were  not  required  to  report  their  numbers 
of  vehicles  for  different  time  periods.  For  systems  missing  this  information,  the 
number  of  vehicles  was  calculated  by  substituting  information  from  the  number  of 
vehicle  opertors  scheduled  for  weekdays  or  the  number  of  vehicles  operating  on  an 
average  weekday.  To  further  assure  that  this  was  done  only  for  systems  with  a  peak 

to  base  ratio  of  one,  other  sources  of  published  information  were  cross-checked, 

23  2A  25  26 

including  APT  A  reports,    '      other  Section  15  reports     and  state  reports  to 

validate  peak  to  base  ratios. 

The  data  on  service  supplied  by  a  transit  agency  and  service  consumed  by 

passengers  underwent  a  special  calculation  to  annualize  them.  While  the  Section  15 

reporting  system  requires  that  all  financial  data  be  reported  for  a  complete  fiscal 

year,  information  on  service  variables  such  as  unlinked  passenger  trips  and  revenue 

vehicle  hours  was  collected  by  a  sampling  procedure  and  reported  for  an  "average 


^■^American  Public  Transit  Association,  Operating  statistics  report  1981: 
Transit  system  operating  statistics  for  calendar/fiscal  year  1980.  (Washington, 
D.C.:  American  Public  Transit  Association,  October  1981.) 

2^American  Public  Transit  Association,  Operating  statistics  report  1980: 
Transit  system  operating  statistics  for  calendar/fiscal  year  1979.  (Washington, 
D.C.:  American  Public  Transit  Association,  October  1980.) 

2^U.S.  Department  of  Transportation,  Transportation  Systems  Center, 
National  urban  mass  transportation  statistics:  1981  Section  15  report.  Report  No. 
UMTA-MA-06-0 107-83-1.  (Springfield,  Va.:  National  Technical  Information 
Service,  November  1982.) 

26state  of  California.  Office  of  the  Controller.  Financial  transactions 
concerninq  transit  operators  and  non-transit  claimants  under  Transportation 
Development  Act:  Annual  report  for  fiscal  year  1980-1981.  (Sacramento,  Calif.: 
State  of  California,  Office  of  the  Controller,  1982.) 
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weekday,"  an  "average  Saturday"  and  "average  Sunday."  This  information  was 
conn-bined  using  a  fornnula  which  annualized  it  so  that  it  was  comparable  to  the 
financial  data.  The  formula  allowed  for  253  weekdays,  53  Saturdays,  52  Sundays  and 
7  holidays  (also  calculated  as  Sundays)  with  each  of  these  numbers  multiplied  by  the 
given  values  for  average  weekdays,  Saturdays  and  Sundays.  This  is  the  same  formula 
used  by  TSC  in  the  Annual  Report.  However,  for  these  data  the  formula  was 
combined  with  a  validation  process  so  a  few  values  differ  from  those  in  the  Annual 
Report. 

A  series  of  calculations  were  also  needed  to  disaggregate  data  so  that  it  applied 
only  to  the  motorbus  mode.  Revenue  and  subsidy  information  are  reported  in 
Section  15  for  the  entire  transit  system,  not  by  mode.  In  addition,  multimodal 
systems  have  the  option  of  reporting  expenses  as  joint  expenses  between  modes,  and 
a  few  systems  report  most  of  their  expenses  in  this  way.  A  series  of  weighting 
formulas  were  designed  which  allowed  assignment  of  revenues  or  joint  expenses  to 
specific  modes.  For  example,  a  proportion  of  passenger  revenue  was  assigned  to  the 
motor  bus  mode  by  multiplying  the  system's  total  passenger  revenues  by  the  ratio  of 
motor  bus  passengers  to  total  passengers.  Although  the  resulting  values  are  only 
estimates,  they  are  an  improvement  on  the  distortions  caused  by  using  overly-large 
figures  or  dropping  the  multi-modal  system  (32%  of  the  systems  reporting  in  1980) 
from  the  analysis.  Appendix  B  summarizes  which  variables  were  weighted  and  the 
procedure  used.  All  later  analyses  were  done  twice,  with  weighted  and  unweighted 
variables  to  assure  that  the  results  were  not  an  artifact  of  the  weighting  procedure. 

Detection  of  Missing  Data 

The  second  phase  of  preparing  data  for  analysis  was  detection  of  cases  having 
missing  data  and  which,  therefore,  needed  to  be  eliminated  from  further  analysis.  A 
database  prepared  for  statistical  analysis  will  usually  have  a  special  symbol  such  as 
-9  which  indicates  that  information  is  missing.  However,  the  Section  15  data  tape 
had  no  such  special  symbol.  Cases  with  missing  data  had  either  a  zero  or  blank. 
Since  there  can  be  "real"  zeroes  (e.g.,  a  system  may  have  no  local  subsidies),  it  was 
necessary  to  differentiate  "real"  zeroes  from  missing  data  zeroes.  Thus  a  missing 
data  symbol  had  to  be  inserted  during  the  process  of  calculating  the  variables.  It 
was  possible  to  detect  the  missing  data  problems  by  considering  the  logical 
properties  of  specific  variables,  by  comparing  a  variable  to  other  information  in  the 
data  base  and  by  comparing  the  Section  15  data  to  other  sources  of  information. 
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For  some  variables,  detecting  missing  data  was  straightforward  and  quite 
logical.  For  instance,  a  transit  system  which  had  no  operating  expenses  was 
assumed  to  have  a  missing  data  problem.  Other  such  variables  were  revenue  vehicle 
drivers,  operating  subsidies,  total  vehicle  hours,  total  wages,  etc. 

But  most  variables  required  more  judgment  on  the  part  of  the  project  staff.  It 
is  possible  for  a  transit  system  to  have  zero  accidents  for  a  given  fiscal  year,  but 
this  is  unlikely  for  large  systems.  Other  transit  systems  of  similar  size  to  the  one 
reporting  zero  accidents  were  examined  to  see  if  zero  was  a  possible  number.  A 
cross-year  comparison  of  reported  accidents  supplied  further  evidence  on  which  to 
base  a  decision.  It  was  decided  for  this  project  that  any  systems  with  more  than  ten 
revenue  vehicles  could  not  have  zero  accidents,  and  a  missing  data  symbol  was 
inserted  for  these  systems.  Smaller  systems  were  then  judged  individually — taking 
into  account  the  number  of  peak  vehicles  required  (a  better  measurement  of  size 
than  revenue  vehicles),  their  safety  record  in  other  years  as  reported  in  Section  1 5 
Reports  or  APIA  reports  and  the  performance  of  like-sized  systems. 

Some  judgments  about  missing  data  involved  making  decisions  about  whether  a 
concept  was  adequately  measured  by  a  combination  of  several  different  variables. 
For  instance,  vehicle  maintenance  could  be  supplied  by  employees  on  the  transit 
agency  payroll  or  by  contract  with  other  organizations.  Thus  if  a  system  reported 
zero  maintenance  employees,  the  system  was  expected  to  have  zero  maintenance 
wages  reported  but  a  substantial  expenditure  for  services  indicated  under  either  the 
maintenance  function  or  general  administration.  In  the  absence  of  wages  and 
service  expenses,  a  missing  data  symbol  was  used  to  indicate  that  maintenance 
expenses  were  missing. 

For  some  other  variables,  the  decision  was  more  complex  because  a  zero  value 
could  be  a  real  value  or  it  could  be  an  indication  of  a  problem.  The  example  of  total 
vehicle  miles  will  make  this  clear.  Total  vehicle  miles,  as  noted  above,  is 
constructed  from  three  variables — average  weekday  miles,  Saturday  miles  and 
Sunday  miles.  If  weekday  miles  were  zero,  it  was  assumed  that  information  was 
missing.  However,  many  systems  do  not  offer  weekend  service,  so  a  zero  for 
Saturday  or  Sunday  miles  might  be  real  or  might  be  an  indication  of  a  problem. 
Since  this  information  was  based  upon  a  time  consuming  sampling  procedure,  there 
was  a  definite  possibility  that  a  transit  system  failed  to  collect  this  information,  and 
thus  had  a  missing  data  problem.  The  Section  15  data  tape  included  information 
about  the  service  schedule  of  each  system.  Therefore,  it  was  possible  to  determine 
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if  a  system  offered  Sunday  service  or  not,  and  thus  whether  it  had  a  missing  data 
problem  or  not. 

The  problem  of  missing  data  received  detailed  attention  because  it  is  an 
inevitable  problem  in  a  data  base  as  complex  as  the  one  mandated  by  Section  15. 
Over  300  different  systems  must  learn  to  interpret  and  fill  out  numerous 
forms — ranging  from  17  pages  for  a  small,  single  mode  system  to  90  pages  for  a 
large,  multi-modal  system.  Since  1980  was  only  the  second  year  in  which  this 
information  was  reported,  some  systems  were  still  in  the  process  of  instituting 
accounting  systems  compatible  with  Section  1 5  requirements. 

Data  Validation 

The  final  phase  of  data  preparation  consisted  of  cross-checking  the  data  for 
validity.  Errors  could  enter  the  database  in  many  ways — misinterpretation  by  a 
transit  system  of  what  number  should  be  reported,  miscalculation  of  totals,  and  key 
punching  errors  as  data  are  prepared  for  the  computer.  Four  major  methods  were 
used  to  validate  the  data:  recomputation  of  totals,  comparisons  of  redundant 
information,  comparisons  of  related  information  and  comparison  to  feasible  value 
ranges.  An  example  of  each  of  these  methods  with  specific  variables  will  be  given. 

The  total  number  of  employees  reported  for  each  system  was  compared  to  the 
sum  of  the  separate  categories.  In  about  ten  cases,  the  totals  differed  by  more  than 
could  be  accounted  for  by  rounding  errors.  In  most  cases  the  differences  were 
apparently  caused  by  keypunching  errors  (e.g.,  reversal  of  digits)  or  simple 
miscalculations.  For  these  cases,  reported  totals  were  replaced  by  the  recalculated 
totals  and  cross-checks  made  with  the  Annual  Reports.  Revenue,  subsidy  and 
expense  totals  were  also  checked. 

Much  of  the  financial  data  was  reported  in  several  different  places.  For 
instance,  the  Revenue  Summary  Schedule  (Form  201)  summarized  the  information  on 
the  Revenue  Subsidiary  Schedule,  (Form  203).  Total  operating  expenses  were  also 
reported  in  two  different  places  on  the  magnetic  tape.  A  simple  comparison  of 
these  numbers  revealed  a  few  differences  and  the  correct  number  was  identified  by 
the  other  validation  methods. 

Different  variables  in  the  database  are  sometimes  different  measures  of  the 
same  thing.  For  instance,  employee  counts  and  employee  wages  are  two  different 
measures  of  labor  utilization.  If  a  transit  system  has  a  large  number  of  vehicle 
operators,  it  must  have  a  proportionately  large  amount  of  vehicle  operator  wages. 
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However  caution  must  be  used  in  some  of  these  comparisons.  Maintenance 
employee  counts  and  maintenance  wages  were  sub-divided  into  distinct,  non- 
comparable  sub-groups,  so  only  the  totals  were  comparable. 

The  final  method  of  identifying  mistakes  was  to  look  for  values  that  lay  outside 
an  expected  range  for  that  specific  variable.  This  method  worked  best  for  measures 
that  were  combinations  of  two  variables  such  as  miles  per  hour  or  cost  per 
passenger.  Miles  per  hour  (speed)  has  an  expected  range  of  about  5  miles  per  hour 
(dense  urban  areas)  to  50  miles  per  hour  (commuter  service).  Any  system  that  fell 
outside  this  range  or  was  in  the  wrong  part  of  the  range  for  the  kind  of  service  it 
offered  probably  had  a  mistake  in  either  its  measure  of  miles  or  hours. 

A  variable  such  as  cost  per  passenger  was  a  little  more  difficult  to  work  with 
since  inflation  and  difference  in  fiscal  years  caused  the  feasible  range  to  change 
over  time  and  the  boundaries  of  a  feasible  range  were  indefinite.  In  this  instance  all 
cases  were  examined  which  lay  more  than  three  standard  deviations  from  the  mean 
as  well  as  the  largest  and  smallest  cases.  While  some  of  these  outliers  had  apparent, 
real  causes,  such  as  extremely  long  trip  lengths,  others  were  so  different  from  the 
norm  that  they  were  obviously  wrong.  In  these  cases  we  looked  for  the  correct 
values  in  other  parts  of  the  database,  or  in  other  sources.  If  a  correction  was 
impossible,  incorrect  values  were  designated  as  missing. 

DATA  EVALUATION 

Once  the  data  were  in  a  form  ready  for  statistical  analysis,  it  was  necessary  to 
select  the  best  variables  for  the  ensuing  analyses.  Once  the  variables  were  chosen, 
it  was  then  necessary  to  evaluate  the  distributional  characteristics  of  each  variable 
in  order  to  select  the  appropriate  statistical  technique.  Finally,  the  sample  of 
transit  systems  with  sufficient  data  to  enter  into  the  analyses  had  to  be  carefully 
described  in  terms  of  how  well  they  represented  the  entire  set  of  transit  systems 
that  were  included  in  the  Section  15  reporting  system  for  FY  1980. 

Evaluation  of  Variables 

Some  kinds  of  variables  were  more  likely  to  have  missing  data  than  others.  In 
FY  1980  the  most  complete  data  were  available  for  economic  variables  such  as 
operating  expenses  and  passenger  revenue  (Table  1-2). 
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TABLE  1-2.  THE  DISTRIBUTION  OF  MISSING  DATA 


IN  SELECTED  TRANSIT  VARIABLES 


Variable 


%  missing  values  out  of  30A 


Passenger  Revenue 
Total  Operating  Expense 
Total  Employees 
Total  Vehicle  Miles 
Unlinked  Passenger  Trips 
Passenger  Miles 


24,0% 


18.1% 


0.7% 


2.0% 


8.2% 


2.6% 


The  most  incomplete  information  was  for  passenger  measures  such  as  unlinked 
passenger  trips  and  passenger  miles.  Since  service  utilization  is  a  major  facet  of 
transit  performance,  it  was  necessary  to  keep  some  measure  of  this  concept 
although  it  would  result  in  more  systems  being  excluded  from  the  analysis.  Unlinked 
passenger  trips  were  chosen  because  this  variable  was  the  most  complete  measure  of 
utilization  and  seemed  less  prone  to  measurement  error. 

Other  variables  seemed  relatively  complete  but  the  validity  checks  described  in 
the  earlier  section  revealed  that  there  were  problems  with  the  values  reported.  In 
many  instances  there  was  not  enough  information  to  cross-check  the  values  or  to 
correct  them.  The  following  variables  had  severe  enough  problems  that  they  were 
eliminated  from  the  final  analyses: 

1.  Active  vehicles:  The  number  of  vehicles  actually  used  to  provide  service 
during  fiscal  year  was  obtained  from  the  Revenue  Vehicle  Inventory.  However,  the 
information  in  the  Revenue  Vehicle  Inventory  was  incomplete  and  had  numerous 
mistakes.  Thus  information  was  not  available  for  each  vehicle  in  the  fleet  of  some 
systems.  Other  systems  had  more  active  vehicles  than  they  actually  owned. 
Numerous  mistakes  in  the  designation  of  mode  resulted  in  vans  being  considered 
motor  buses  and  vice  versa.  Since  about  a  third  of  the  transit  systems  had  major 
errors  on  this  item,  it  was  not  used  in  the  analyses. 

2.  Fuel:  There  were  four  different  fuels  reported  in  use  with  motor  buses: 
diesel  fuel,  gasoline,  bunker  fuel  and  liquid  natural  gas.  A  number  of  transit  systems 
had  combinations  of  fuels.  A  major  problem  was  that  bunker  fuel  is  not  normally 
considered  a  motor  bus  fuel  so  several  systems  had  a  coding  error  on  fuel.  It  was 
also  hard  to  compare  the  efficiency  of  different  fuels.    In  addition,  for  those 


26 


systems  using  several  kinds  of  fuel,  there  was  no  way  to  allocate  nniles  to  one  fuel  or 
another.  Although  mileage  information  should  have  been  available  on  the  Revenue 
Vehicle  Inventory,  problems  with  that  data  precluded  its  use  in  this  instance.  Thus 
measures  of  fuel  efficiency  were  eliminated  from  the  final  analysis. 

3.  Subsidies:  Transit  systems  did  not  use  consistent  definitions  for  designating 
whether  particular  subsidies  were  from  state,  local  or  federal  sources.  For  instance, 
in  California  half  of  the  transit  systems  called  the  subsidies  from  a  particular  source 
local  funding,  while  others  called  it  state  funding.  Since  subsidy  programs  vary  from 
state  to  state,  there  was  no  comparability  in  these  definitions.  For  the  analysis,  all 
sources  of  subsidies  have  been  combined  and  only  measures  of  total  subsidies  used. 

4.  Miles  of  line  (route  miles):  The  definitions  used  by  the  Section  15  reporting 
systems  for  measuring  miles  of  line  were  confusing  and  interpreted  differently  by 
different  systems.  The  excessive  variance  in  reported  miles  made  this  variable 
unreliable  for  statistical  analysis  and  it  was  eliminated. 

5.  Roadcalls:  As  with  miles  of  line,  the  definitions  for  roadcalls  were  not 
consistent  across  transit  systems  and  this  variable  was  not  used. 

6.  Maintenance  expenses:  Since  transit  systems  can  do  maintenance  in-house  or 
through  purchased  services,  variables  relating  to  maintenance  must  be  used  with 
caution.  Transit  Systems  with  no  reported  maintenance  expenses  were  eliminated. 
Those  remaining  in  the  analysis  have  a  variety  of  maintenance  arrangements  and  any 
one  measure  of  maintenance  efficiency  may  not  be  sufficient  to  represent  the 
situation  for  all  systems. 

Evaluation  of  the  Distribution  of  Variables 

Most  of  the  basic  variables  were  not  normally  distributed  and  thus  certain  kinds 
of  statistical  analyses  had  to  be  used  with  caution.  There  were  many  more  small 
transit  systems  (less  than  25  vehicles)  than  large,  so  the  distributions  were  very 
peaked  at  the  small  end  of  the  scale  for  variables  which  reflect  the  size  of  a  transit 
system,  such  as  number  of  peak  vehicles,  operating  expenses  and  subsidies.  There 
were  also  a  few  very  large  transit  systems,  such  as  New  York,  Chicago  and  Los 
Angeles,  that  were  so  much  larger  than  the  others  that  they  were  outliers  causing 
the  distribution  of  variables  to  be  very  skewed.  These  systems  were  too  important 
in  terms  of  the  amount  of  transit  they  provide  to  eliminate  them  from  the  analysis. 
Thus  statistical  methods  had  to  be  chosen  which  minimized  the  influence  of  these 
outliers  or  the  variables  had  to  be  transformed  so  that  they  met  more  of  the 


27 


assumptions  of  the  statistical  techniques  used.  Both  of  these  approaches  were  used 

and  are  described  in  later  chapters  where  relevant.    More  information  on  the 
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distributional  properties  of  specific  variables  is  available  in  another  paper. 

Evaluation  of  the  Sample 

The  cases  with  insufficient  data  to  be  included  in  the  analyses  were  not 
randomly  distributed  throughout  the  sample  of  transit  systems  included  in  the 
Section  15  data  base.  The  missing  data  situation  was  particularly  acute  for  small 
systems — those  with  fewer  than  25  revenue  vehicles.  Thirty  per  cent  of  these 
systems  were  missing  information  on  passenger  trips  and  six  per  cent  on  expenses. 
Although  the  analyses  still  included  substantial  data  on  smaller  systems,  since  over 
one  third  of  the  systems  reporting  data  fall  into  this  size  category,  generalizations 
to  all  small  systems  must  be  made  cautiously. 
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Since  the  results  of  this  project  are  being  compared  to  an  earlier  project,  it 
was  necessary  to  examine  the  issue  of  how  stable  the  sample  of  transit  systems  was 
between  FY  1979  and  FY  1980.  About  18%  of  the  transit  systems  differed  between 
the  two  years — with  some  cases  dropping  out  and  new  ones  entering  into  the 
sample.  This  changeover  was  most  common  among  the  smallest  systems.  So  cross- 
year  comparisons  must  be  made  with  caution,  particularly  for  the  smallest  systems. 

CONCLUSION 

Data  used  in  the  analysis  of  transit  performance  were  based  upon  the  Section  1 5 
data  but  are  not  identical  with  those  reported  on  the  TSC  tape  or  in  the  Annual 
Report  for  FY  1980.  Obvious  errors  have  been  corrected  and  missing  entries  have 
been  designated  with  the  -9  symbol.  Some  agencies  were  eliminated  because  either 
there  were  too  many  missing  items  or  obviously  inconsistent  values  could  not  be 
verified.  The  data  base  was  reorganized  into  a  format  suitable  for  statistical 
analysis  in  preparation  for  the  factor  analysis  reported  in  the  next  chapter. 


2^Gordon  J.  Fielding,  Mary  E.  Brenner,  and  Olivia  de  la  Rocha,  Using  Section 
15  data:  Adapting  and  evaluating  the  magnetic  tape  version  for  statistical  analysis. 
Working  paper  no.  83-6.  (Irvine,  Calif.:  University  of  California,  Institute  of 
Transportation  Studies,  December  1983.) 

^^Shirley  C.  Anderson  and  Gordon  J.  Fielding,  Comparative  Analysis  of  transit 
performance.  Final  report  No.  UMTA-CA-1 1-0020-1.  (Irvine,  Calif.:  University  of 
California,  Institute  of  Transportation  Studies,  January  1982.)  (NTIS  No.  PB 
82-196A78.) 
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CHAPTER  2 

IDENTIFYING  KEY  PERFORMANCE  INDICATORS 


INTRODUCTION 

Section  15  of  the  Urban  Mass  Transportation  Act  of  196A,  as  annended,  has 

provided  for  the  collection  of  a  unique  set  of  comparable  transit  statistics  by 

requiring  all  urban  transit  applicants  for  operating  assistance  to  provide  a  uniform 

set  of  information  about  their  transit  systems.  The  first  year  of  Section  1 5  reported 

statistics  [FY  1978-79]  was  used  by  Anderson  and  Fielding^  to  test  the  performance 

2 

concept  model  developed  by  Fielding,  and  Glauthier  and  Lave  .  A  set  of  nine 
performance  indicators  was  selected,  representing  the  three  dimensions  of  transit 
performance.  However,  serious  questions  were  raised  about  the  validity  and 
completeness  of  the  first  year's  data.  Only  98  agencies  out  of  311  could  be  used  in 
the  final  factor  analysis.  The  rest  were  dropped  because  of  missing  and  imprecisely 
reported  data.  Other  questions  were  raised  by  reviewers  about  the  validity  of  the 
indicators  selected  based  upon  a  single  factor  analysis  solution.  Although  the  results 
had  not  previously  been  satisfying,  the  method  of  using  factor  analysis  to  identify 
clusters  of  variables  and  performance  indicators  held  promise.  If  the  data  set  could 
be  improved,  using  the  data  techniques  described  in  Chapter  1,  then  more  rigorous 
factor  analytic  solutions  could  be  applied  on  different  versions  of  the  data  to  test 
the  validity  of  the  performance  model. 

Following  the  data  cleaning  and  verifying  routines  outlined  in  Chapter  1,  data 
from  the  second  year  of  reported  statistics  [FY  1980]  were  analyzed.  Chapter  2 
addresses  two  issues.  It  replicates  the  methods  and  compares  results  to  the  first 
year  [FY  1979]  statistical  analysis.  Secondly,  a  thesis  is  advanced  that  there  exists 
a  highly  consistent  set  of  performance  concepts  relevant  to  fixed  route  transit 
operators  and  a  small,  unique  subset  of  performance  indicators  that  are  useful  for 
performance  evaluation  by  individual  transit  managers  for  systems  of  all  sizes. 


^Shirley  C.  Anderson  and  Gordon  J.  Fielding,  Comparative  analysis  of  transit 
performance.  Final  report  No.  UMTA-CA-1 1-0020-1.  (Irvine,  Calif.:  University  of 
California,  Institute  of  Transportation  Studies,  January,  1982.)  (NTIS  No. 
PB82- 196^78.) 

^Gordon  J.  Fielding,  Roy  E.  Glauthier  and  Charles  A.  Lave,  Performance 
indicators  for  transit  management.  Transportation.  1978,  7(A),  365-379. 
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Results  from  the  analyses  undertaken  here  are  compared  to  previous  research  and 
suggestions  are  offered  for  the  use  of  the  seven  key  performance  indicators 
identified  as  being  the  most  useful  for  cross-sectional  analysis. 

Emphasis  is  given  to  describing  the  sequence  of  steps  used  to  explore  the  thesis 
that  a  highly  consistent  set  of  performance  concepts  exists  and  that  they  can  be 
represented  by  a  small,  unique  set  of  performance  indicators.  Results  from  previous 
research  have  been  controversial.    Therefore,  we  have  endeavored  to  explain  how: 

performance  indicators  were  selected  and  calculated  in  alternative  ways 

to  minimize  bias 

different  methods  of  factor  analysis  were  used  to  explore  the  structure  of 
performance  concepts 

tests  were  used  to  verify  the  structure  of  performance  concepts. 

seven  performance  indicators  were  identified  as  being  the  most  useful  for 

cross-system  analysis. 

PERFORMANCE  EVALUATION  USING  SECTION  15  DATA 

Section  1 5  data  has  been  crucial  to  the  analysis:  it  is  only  through  the  use  of  a 
nationwide  set  of  comparable  data  that  identification  of  globally-oriented 
performance  indicators  can  be  assessed.  A  wide  variety  of  Section  15  statistics  was 
evaluated  as  performance  indicators.  Three  categories  of  statistics — service  inputs, 
service  outputs  and  service  consumption — provided  the  framework  to  organize  the 
much  larger  set  of  data. 

Figure  2-1  portrays  the  organizing  framework  developed  in  the  Fielding,  et  al. 
performance  concept  model.  Cost-efficiency  indicators  measure  service  inputs 
(labor,  capital,  fuel)  to  the  amount  of  service  produced  (service  outputs:  vehicle 
hours,  vehicle  miles,  capacity  miles,  service  reliability).  Cost-effectiveness 
indicators  measure  the  level  of  service  consumption  (passengers,  passenger  miles, 
operating  revenue)  against  service  inputs.  Finally,  service-effectiveness  indicators 
measure  the  extent  to  which  service  outputs  are  consum.ed. 

The  overriding  goal  of  this  research  was  to  identify  those  key  performance 
statistics:    1)  that  provide  transit  analysts  with  the  most  salient  performance 


-"T.  A.  Patton,  Transit  performance  indicators.  Transportation  Systems  Center 
Staff  Study  #SS-67-0.5-01 .  (Washington:  U.S.  Department  of  Transportation,  1985.) 
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FIGURE  2-1.   FRAMEWORK  FOR  TRANSIT 
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information  and  2)  that  target  information  which  is  equally  valid  for  each  transit 
agency  and  thus  for  cross-system  analysis. 

One  result  of  the  analyses  that  follow  was  the  identification  of  a  small,  unique 
set  of  key  performance  indicators  that  met  the  overriding  goal  of  this  research. 
Seven  performance  variables  from  a  much  larger  data  set  were  identified.  These 
can  be  used  to  assess  the  performance  of  any  fixed  route,  motor  bus,  transit 
system.  A  minimum  of  three  of  the  seven  variables  will  provide  key  information  on 
cost  efficiency,  cost  effectiveness  and  service  effectiveness.  Further,  all  seven  of 
these  performance  indicators  and  a  parallel  set  of  "alternates"  can  be  used  for 
cross-system  comparisons  with  peers. 

The  following  sections  describe  how  Section  15  data  was  used  to  identify  these 
seven  performance  indicators,  and  how  they  were  rigorously  tested  to  ensure  their 
validity  for  use.  The  main  focus  of  this  research  has  been  to  provide  transit  analysts 
with  a  set  of  easily  accessible  statistics  with  which  to  do  individual  and  peer  group 
comparisons  of  performance.  The  second  goal  was  to  evaluate  the  validity  of  the 
earlier  analysis  conducted  on  the  FY  1979  data.  The  body  of  this  chapter  explains 
how  both  goals  were  accomplished. 

SELECTING  PERFORMANCE  INDICATORS 

A  wide  variety  of  performance  indicator  ratios  was  available  from  the  Section 
15  data  base.  In  selecting  the  set  of  performance  indicators  to  be  used  for  further 
analysis,  the  data  included  variables  that  would  relate  to  the  conceptual  model  i.e., 
those  that  would  best  represent  the  three  categories  of  performance  concepts — 
cost-efficiency,  cost-effectiveness  and  service-effectiveness.  Particular  attention 
was  given  to  the  availability  and  reliability  of  the  data  from  which  the  ratios  would 
be  calculated.  As  noted,  some  of  the  Section  15  data  variables  were  more  complete 
or  more  reliable  than  others. 

Table  2-1  lists  the  initial  set  of  forty-eight  variables  selected  for  further 
multivariate  analysis.  The  variables  are  organized  under  the  performance  concept 
to  which  they  relate.  This  set  of  forty-eight  variables  in  most  cases  (other  than 
passenger  data)  represent  the  most  complete,  generally  reliable  and  non-redundant 
performance  indicators  available  in  the  current  (FY  1980)  Section  15  data  set. 

Variables  based  on  revenue  capacity  miles  were  not  included  because  of  a 
detected  inconsistency  in  the  measurement  of  that  variable  across  systems.  Ratios 
based  on  population  data  were  not  included  because  available  population  information 
reflected  total  urban  population  rather  than  service  area  population.  Otherwise, 
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TABLE  2-1.  PERFORMANCE  INDICATORS  BY  CONCEPT 


COST  EFFICIENCY  MEASURES 


Labor  Efficiency 


Vehicle  Hours  per  Employee 

Revenue  Vehicle  Hours  per  Operating  Employee  Hour 

Vehicle  Miles  per  Employee 

Peak  Vehicles  per  Executive,  Professional  and 

Supervisory  Employees 
Peak  Vehicles  per  Operating  Personnel 
Peak  Vehicles  per  Maintenance,  Support  and 

Servicing  Personnel 

Vehicle  Efficiency 

Vehicle  Hours  per  Active  Vehicle 

Vehicle  Hours  per  Peak  Vehicle  Requirement 

Vehicle  Miles  per  Active  Vehicle 

Vehicle  Miles  per  Peak  Vehicle  Requirement 

Revenue  Vehicle  Miles  per  Vehicle  Miles 


TVH/EMP 

RVH/OEMP 

TVM/EMP 

PVEH/ADM 
PVEH/OP 

PVEH/MNT 


TVH/AVEH 
TVH/PVEH 
TVM/AVEH 
TVM/PVEH 
RVM/TVM 


Fuel  Efficiency 

Revenue  Vehicle  Miles  per  Gallon  Diesel 
Vehicle  Miles  (Bus)  per  Gallon  Diesel 

Maintenance  Efficiency 

Total  Vehicle  Miles  per  Maintenance  Expense 
Vehicle  Miles  per  Maintenance  Employee 
1,000,000  Vehicle  Miles  per  Roadcall 


RVM/FUEL 
TVM/FUEL 


TVM/MEXP 

TVM/MNT 

TVM/RCAL 


Output  per  Dollar  Cost 

Revenue  Vehicle  Hours  per  Operating  Expense 
Vehicle  Miles  per  Operating  Expense 
Revenue  Vehicle  Hours  per  Total  Labor  and  Fringe 
Expenses 

Revenue  Vehicle  Hours  per  Operations  Labor  and 

Fringe  Expenses 
Revenue  Vehicle  Hours  per  Vehicle  Maintenance 

Labor  and  Fringe  Expenses 
Revenue  Vehicle  Hours  per  Administrative  Labor 

and  Fringe  Expenses 


RVH/OEXP 
TVM/OEXP 

RVH/TWG 

RVH/OWAG 

RVH/VMWG 

RVH/ADWG 
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TABLE  2-1.  (continued) 


SERVICE  EFFECTIVENESS  MEASURES 


Utilization  of  Service 

Passenger  Trips  per  Revenue  Vehicle  Hours 
Passenger  Trips  per  Revenue  Vehicle  Mile 
Passenger  Trips  per  Peak  Vehicle  TPAS/PVH 
Passenger  Miles  per  Passenger 

Operating  Safety 

1,000,000  Vehicle  Miles  per  Accident 
Revenue  Vehicle  Hours  per  Accident 


TPAS/RVH 
TPAS/RVH 

PASM/TPS 


TVM/ACC 
RVH/ACC 


Revenue  Generation 

Passenger  Revenue  per  Peak  Vehicle 
Passenger  Revenue  per  Revenue  Vehicle  Hour 
Operating  Revenue  per  Revenue  Vehicle  Hour 
Passenger  Revenue  per  Passenger 

Public  Assistance 


REV/PVEH 
REV/RVH 
OREV/RVH 
REV/TPAS 


Revenue  Vehicle  Hours  per  Local  Capital  and 

Operating  Assistance 
Revenue  Vehicle  Hours  per  State  Capital  and 

Operating  Assistance 
Revenue  Vehicle  Hours  per  Total  Operating  Assistance 
Revenue  Vehicle  Hours  per  Total  Capital  and 

Operating  Assistance 
Passengers  per  Local  Operating  Assistance 
Passengers  per  Total  Operating  and  Capital  Assistance 
Passenger  Revenue  per  Total  Operating  and  Capital 

Assistance 

Passenger  Revenue  per  Total  Operating  Assistance 
Passengers  per  Total  Operating  Assistance 


RVH/LSUB 

RVH/SSUB 
RVH/OSUB 

RVH/TSUB 
TPA5/L0A 
TPAS/TSUB 

REV/TSUB 
REV/OSUB 
PAS/OSUB 


COST  EFFECTIVENESS  MEASURES 


Service  Consunnption  per  Expense 

Passengers  per  Operating  Expense 
Passenger  Miles  per  Operating  Expense 
Passengers  per  Total  Labor  and  Fringe  Benefits 
Passengers  per  Gallon  Diesel  Fuel 
Passenger  Miles  per  Total  Expense 

Revenue  Generation  per  Expense 

Ratio  Operating  Revenue  to  Operating  Expense 
Ratio  Total  Revenue  to  Total  Expense 


PAS/OEXP 

PASM/OEX 

PAS/TWAG 

PAS/FUEL 

PASM/TEX 


OREV/OEXP 
TREV/TEX 
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performance  indicator  ratios  comparable  to  the  1979  data  analyses  were  selected 
for  use.  This  facilitated  comparison  with  previous  results  and  identification  of 
shifts  due  to  the  better-collected,  cleaner  and  more  complete  data. 

Missing  Data  Effects 

Missing  values  encountered  at  any  point  in  the  computation  of  basic  and  ratio 
variables  and  during  the  multivariate  statistical  procedures  cause  a  "snowball" 
effect  of  missing  information  to  occur.  The  assumption  in  the  computation  and 
analysis  procedures  is  that  every  case  has  information  for  all  of  the  variables.  This 
problem  and  solutions  used  to  address  it  were  discussed  in  Chapter  1  under  Data 
Reorganization.  If  any  case  is  missing  even  one  piece  of  information  it  is  thrown 
out  of  the  computations  and  subsequent  analyses.  The  missing  values  problem  has  a 
cumulative  effect  as  cases  are  dropped  from  the  analysis.  Thus,  from  a  total  of  304 
transit  systems  running  fixed  route,  motor  bus  service,  only  two-thirds  of  the 
cases — 198  systems — had  enough  information  available  for  use  in  the  analyses. 
However,  this  is  a  vast  improvement  over  the  98  systems  which  could  be  used  from 
the  FY  1979  data. 

Distribution  of  the  Data 

One  of  the  first  tasks  for  exploring  the  data  set  was  to  search  for  extreme 
outliers  and  to  remove  them  from  the  analysis.  Extreme  outliers  could  force  the 
analysis  to  focus  on  the  inflated  variance  due  to  the  presence  of  an  outlier,  rather 
than  the  more  true-to-data  variance  present  across  the  range  of  the  other  cases. 

The  next  task  was  to  check  the  univariate  descriptive  statistics  for  each  of  the 
selected  performance  indicator  ratios  to  evaluate  the  distribution  of  the  case  values 
across  the  variable  range.  Most  commonly  used  bivariate  and  multivariate 
procedures  assume  a  normal-like  distribution  of  the  case  values  in  each  variable. 

As  noted  in  Chapter  1,  the  large  proportion  of  small  systems  and  the  presence 
of  a  few  very  large  transit  systems  affected  the  distribution  of  data  in  most  of  the 
basic  variables.  Two  descriptive  statistics  that  provide  information  on  how  far  a 
variable  deviates  from  a  normal-like  distribution  of  values  are  skewness  and 
kurtosis.  For  a  normal  distribution  of  data,  both  skewness  and  kurtosis  equal  zero; 
for  each  statistic  the  further  from  zero  the  value,  the  less  normal-like  is  the  data 
distribution.  The  less  normal-like  the  distribution,  the  more  questionable  the 
statistical  results. 
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Included  as  Appendix  C  is  a  listing  of  relevant  descriptive  statistics  for  each  of 
the  forty-eight  performance  indicators  selected  for  further  analysis.  The  skewness 
and  kurtosis  values  for  the  list  of  forty-eight  variables  ranged  from  -5.212  to  16.098 
and  from  1.375  to  263.908  respectively,  indicating  that  the  distributions  were  far 
from  normal.  The  proposed  multivariate  procedures  to  be  used  on  the  performance 
indicator  data  set  were  considered  relatively  "robust,"  i.e.,  valid  even  under 
deviations  from  normality.  Robustness  is  of  greatest  concern  when  using  inferential 
statistical  techniques.  However,  even  descriptive  techniques,  like  the  ones  used 
here,  could  be  affected  by  highly  skewed  data.  As  the  goal  of  this  research  was  to 
provide  a  highly  reliable  set  of  consistent  analytical  findings  that  could  serve  as  a 
benchmark  for  cross-year  comparisons,  it  was  important  to  begin  with  a  set  of  data 
that  had  a  minimum  of  distributional  problems. 

To  counter  any  possible  bias  in  the  analyses  and  to  provide  a  comparable  set  of 
more  normally  distributed  performance  indicator  variables,  the  base  10  logarithms 
of  the  forty-eight  performance  indicators  were  calculated.  Logarithms  preserve  the 
essential  data  structure  of  the  variables  from  which  they  arise  while  shifting  the 
distribution  of  the  data  to  a  more  normally  shaped,  i.e.,  less  skewed,  curve^.  This 
provided  two  sets  of  comparable  data — the  forty-eight  performance  indicator  ratios 
calculated  from  the  Section  15  data  and  a  set  of  forty-eight  logarithm  variables 
calculated  from  these. 

In  developing  the  strongest  set  of  data  on  which  to  base  analytical  findings,  a 
second  question  arose.  As  mentioned  in  Chapter  1,  revenue  data  is  reported  as  a 
total  for  the  whole  system;  it  is  not  broken  down  by  mode  when  more  than  one  mode 
exists.  It  had  also  been  necessary  to  use  total  subsidy  information.  A  third  set  of 
performance  ratios  was  developed  using  basic  variable  data,  subsidy  information, 
and  revenue  statistics  that  were  weighted  to  eliminate  revenue  from  modes  other 
than  bus  transit.  Then,  a  full  set  of  forty-eight  base  10  logarithms  was  calculated 
on  the  weighted  data,  again,  to  provide  a  less  skewed  data  distribution. 

As  a  result  of  the  cleaning,  verifying  and  grooming,  four  somewhat  different 
sets  of  performance  indicator  data  were  available:  a)  ratios  from  reported  data, 
b)  logs  of  reported  data  variables,  c)  ratios  from  the  weighted  reported  data,  d)  logs 
of  the  weighted  data  variables.  As  noted.  Appendix  C  includes  descriptive  statistics 


^J.  B.  Kruskal,  Transformations  of  data.  International  encyclopedia  of  the 
social  sciences.  David  L.  Sills,  Ed.,  Vol.  15.  (New  York:  Macmillan  Co.,  1968),  pp. 
182-192. 
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for  each  of  the  four  sets  of  data.  The  purpose  for  developing  these  four  sets  was  to 
ensure  that  when  final  results  from  multivariate  analyses  were  reported,  most 
contingencies  for  possible  bias  in  the  data  had  been  addressed.  Consistent  results 
across  the  four  data  sets  would  provide  evidence  that  a  stable  performance  concept 
structure  had  been  found  in  the  data. 

EXPLORATORY  ANALYSES 

Multivariate  analyses  were  used  to  search  for  a  highly  consistent  set  of 
performance  concepts  relevant  to  fixed  route  transit  and  for  a  small,  unique  subset 
of  conveniently  useable  performance  indicators.  Factor  analysis  is  ideal  for 
detecting  the  most  salient  features  of  a  set  of  data  and  for  determining  those  few 
key  variables  with  which  a  whole  range  of  information  can  be  represented.  The 
prime  objective  in  this  research  was  to  search  for  the  minimum  amount  of  data 
necessary  to  convey  the  maximum  amount  of  performance  information.  Parsimony 
and  consistency  were  the  key  criteria;  factor  analysis  was  the  most  efficient  means. 

Factor  Analysis  Defined 

The  most  distinctive  characteristic  of  factor  analysis  is  its  ability  to  reduce  a 
large  set  of  data  to  a  smaller  set  of  "components"  or  "factors"  which  portray  the 
underlying  structure  of  relationships  among  a  set  of  variables.  Based  upon  the 
correlation  patterns  of  a  large  number  of  variables,  the  objective  of  the  factor 
analytic  technique  is  to  group  together  those  variables  which  are  highly  correlated 
with  each  other.  The  analyst  then  interprets  each  factor  according  to  the  variables 
belonging  to  the  group.  The  idea  is  to  summarize  many  variables  by  using  a  few 
representative  factors.  Appendix  D  portrays  the  correlation  matrix  for  the 
variables  from  the  weighted  reported  data — the  more  correct  of  the  two  raw  data 
sets. 

There  are  two  main  types  of  factor  analyses,  principal  components  analysis  and 
inferential  or  "classical"  factor  analysis.  The  former  works  from  the  assumption 
that  the  entire  population  of  cases — not  a  sample — is  being  analyzed.  Analytical 
solutions  describe  the  data  at  hand  and  the  relationships  among  the  variables  as 
represented  in  the  input  data.  Inferential  factor  analysis,  however,  adjusts 
analytical  solutions  to  make  predictions  about  a  larger,  ideal  population.  Because 
the  entire  population  of  motor  bus  systems  was  represented  in  the  data,  and  because 
no  sampling  technique  had  been  used  to  select  systems  for  analysis,  principal 
components  factor  analysis  was  used. 
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The  basic  factor  analysis  model  assumes  that  in  any  set  of  variables,  there  exist 
two  main  types  of  variation  or  variance:  variance  commonly  shared  by  all  the 
variables  in  the  set  and  variance  unique  to  each  individual  variable.  Commonly 
shared  variance  contributes  to  the  intercorrelations  of  variables.  The  patterns  of 
intercorrelations  are  used  to  group  variables  into  a  smaller  number  of  factors.  The 
number  of  factors  necessary  to  portray  this  underlying  data  structure  depends  on 
how  much  more  commonly  shared  variance  continues  to  be  detected  with  the 
addition  of  each  new  factor.  The  order  in  which  the  factors  emerge  from  the  data  is 
important.  The  first  factor  accounts  for  the  largest  portion  of  shared  variance  in 
the  data.  With  each  successive  factor,  less  and  less  of  the  shared  variance  is 
accounted  for.  At  the  point  where  little  more  explained  variance  is  detected,  the 
procedure  halts  and  the  factor  structure  is  considered  complete. 

Factor  analysis  not  only  provides  information  on  the  number  of  factors 
underlying  the  data,  it  also  determines  which  variables  grouped  on  a  particular 
factor  are  most  highly  related  or  representative  of  the  identified  factor.  The  factor 
loading  of  each  variable  on  the  respective  factors  can  be  interpreted  as  the 
correlation  of  the  variable  with  the  factor;  high  factor  loadings  represent  high 
correlations. 

In  performing  any  factor  analysis,  there  are  several  problem  areas  that  could 
exist  in  the  data  and  obscure  the  underlying  data  structure^: 

1)  Two  variables  carry  highly  redundant  information  (collinearity).  A 
correlation  coefficient  of  .98  or  larger  between  two  variables  would  show 
that  either  variable  could  be  used  to  present  nearly  the  same  information. 

2)  A  variable  loads  across  several  factors  equally  well  (poorly  defined 
structure  in  the  variable).  When  a  variable  portrays  a  pattern  of  factor 
loadings  that  are  either  almost  equal,  or  are  high  across  several  factors, 
the  variable  does  not  contribute  to  defining  the  underlying  structure  of 
the  data  set. 

3)  One  factor  has  all  or  most  of  the  variables  weighting  heavily  on  it  (poorly 
defined  structure  in  the  data  set).  Such  a  factor  then  becomes  a  complex 
"catch-air  category  for  data,  and  the  underlying  concepts  of  the  data 
become  obscured. 


-*A.  L.  Comrey,  A  first  course  in  factor  analysis.  (New  York:  Academic  Press, 
1973).  pp.  189-197. 
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The  first  exploratory  factor  analysis  was  begun  with  the  most  complete  set  of 
performance  indicator  ratios  available  in  the  Section  15  data  i.e.,  the  forty-eight 
performance  indicators  selected  for  analyses.  It  remained  necessary  to  assess  how 
well  these  variables  measured  the  target  information  and  how  relevant  the 
indicators  were  for  cross-system  analysis.  The  next  task  involved  determining  from 
the  set  of  forty-eight  variables,  which  subset  of  variables  provided  the  best 
cross-sectional  measures  and  best  defined  the  structure  in  the  data  while  testing  the 
data  for  the  three  possible  contaminating  problems  listed  above.  After  each 
exploratory  factor  analysis  was  performed,  the  resulting  factor  loading  matrix  was 
evaluated. 

As  four  parallel  sets  of  performance  indicators  were  available,  the  same  type  of 
exploratory  factor  analysis  was  carried  out  on  each  set.  Finding  similar  results 
across  the  four  data  sets  would  signal  detection  of  the  consistency  in  the  data  which 
would  point  to  the  "true"  underlying  structure  in  the  variables. 

Variable  Elimination 

In  the  first  exploratory  analysis  on  the  full  set  of  forty-eight  performance 
indicators,  a  total  of  128  cases  were  included  in  the  analysis.  As  mentioned  earlier, 
factor  analysis  will  drop  from  the  analysis  every  case  missing  any  piece  of 
information.  Because  the  missing  values  were  scattered  throughout  the  forty-eight 
variable  set,  the  snowball  effect  of  missing  data  across  a  set  of  variables,  had 
eliminated  nearly  two-thirds  of  the  cases  from  the  analysis.  Thus,  in  the  next 
exploratory  pass  through  the  data,  it  was  decided  to  eliminate  from  further 
analyses,  those  variables  that  compounded  the  missing  data  problem  and  those  that 
were  still  somewhat  questionable  as  to  the  quality  and  comparability  of  reported 
information. 

Fuel  related  variables  (RVM/FUEL,  TVM/FUEL)  were  omitted  because  with 
four  different  types  of  fuel  listed  for  motor  bus  operations  it  was  difficult  to  validly 
compare  fuel  efficiency  across  systems.  Local  and  state  subsidy  related  variables 
(e.g.,  RVH/LSUB,  RVH/SSUB)  were  removed  because  definitions  of  local  versus 
state  subsidies  were  inconsistent.  Capital  subsidy  variables  were  omitted  because 
they  can  greatly  shift  from  year  to  year. 

The  passenger  miles  (PASM)  variable  was  missing  from  almost  20%  of  the 
cases.  To  increase  the  number  of  cases  entering  into  the  analysis,  variables  based 
on  PASM  (e.g.,  PASM/OEX,  PASM/TPS)  were  eliminated  from  the  data  set. 


39 


Variables  related  to  active  vehicle  counts  were  also  removed  because  about  a 
third  of  the  cases  have  a  problem  of  some  sort.  A  distinction  v^as  not  always  made 
between  school  buses,  charter  buses  and  other  motor  buses.  Some  cases  listed  more 
active  vehicles  than  total  vehicles  and  vehicle  inventories  were  incomplete  for  some 
companies. 

The  variable  RVM/TVM  was  eliminated  because  sixty-five  of  the  cases  had 
revenue  vehicle  miles  equal  to  total  vehicle  miles,  a  strong  indication  of  a 
definitional  problem,  which  greatly  inflated  the  kurtosis  value  of  the  variable.  The 
roadcall  related  variable,  TVM/RCAL,  was  ignored  because  the  definitions  for  what 
makes  a  true  roadcall  were  unreliable.  The  variables  related  to  total  expense  (e.g., 
PASM/TEX,  TREV/TEX)  were  removed  because  total  expense  is  not  truly 
comparable  across  systems;  there  are  no  set  parameters  for  depreciating  capital 
costs.  Finally,  REV/RVH  was  so  highly  correlated  with  OREV/RVH  that  it  was 
eliminated,  to  counter  redundancy  in  the  data. 

With  each  exploratory  factor  analytic  pass  through  the  data  sets,  the  variables 
were  checked  against  the  factor  structure  to  determine  if  remaining  variables 
presented  any  of  the  structural  problems  mentioned  above.  The  factor  loading 
pattern  resulting  from  each  of  the  exploratory  analyses  was  evaluated  to  identify 
that  set  of  variables  which  best  determined  the  emerging  underlying  structure  of  the 
data.  With  each  pass  through  the  data,  the  underlying  structure  became  more 
clearly  defined.  The  number  of  cases  entering  into  the  analysis  had  increased  from 
128  to  198  and  the  same  general  solution  appeared  across  the  four  different  sets  of 
data. 

The  final  set  of  thirty  performance  indicators  that  remained  after  the  fourth 
pass  through  the  data  reflected  a  strong  set  of  performance  indicator  variables. 
These  portrayed  such  highly  consistent  factor  loadings  across  all  four  data  sets  that 
it  was  evident  that  the  most  salient  features  of  the  performance  concept  model  had 
been  identified. 

Table  2-2  lists  the  forty-eight  performance  indicator  variables  selected  for 
analysis  from  the  Section  1 5  data  base.  They  are  portrayed  within  the  framework  of 
the  Fielding  et  al.  conceptual  model.  Those  variables  eliminated  prior  to  the  final 
analysis  are  marked  with  an  asterisk  to  offset  them  from  the  final  set  of  thirty 
performance  indicators  used  in  subsequent  analyses. 
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TABLE  2-2.  FORTY-EIGHT  PERFORMANCE  INDICATOR  VARIABLES 

USED  IN  ANALYSES 


COST  EFFICIENCY  MEASURES 


TVH/EMP 

RVH/OEMP 

TVM/EMP 

PVEH/ADM 

PVEH/OP 

PVEH/MNT 

»»TVH/AVEH 
TVH/PVEH 

»»TVM/AVEH 
TVM/PVEH 

♦*RVM/TVM 


*RVM/FUEL 
♦♦TVM/FUEL 
TVM/MEXP 
TVM/MNT 
♦♦TVM/RCAL 
RVH/OEXP 
TVM/OEXP 
RVH/TWG 
RVH/OWAG 
RVH/VMWG 
RVH/ADWG 


SERVICE  EFFICIENCY  MEASURES 


TPAS/RVH 
TPAS/RVM 
TPAS/PVH 

'♦PASM/TPS 
TVM/ACC 
RVH/ACC 
REV/PVEH 

*REV/RVH 
OREV/RVH 
REV/TPAS 


'♦RVH/LSUB 
*RVH/SSUB 
RVH/OSUB 
*RVH/TSUB 
»»TPAS/LOA 
»*TPAS/TSUB 
♦*REV/TSUB 
REV/OSUB 
PAS/OSUB 


COST  EFFECTIVENESS  MEASURES 

PAS/OEXP 
*PASM/OEX 

PAS/TWAG 
"PAS/FUEL 
*PASM/TEX 

OREV/OEXP 
»»TREV/TEX 


**Variable  omitted  prior  to  final  analysis 
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FINAL  FACTOR  ANALYSIS  ON  THIRTY  PERFORMANCE  INDICATOR  RATIO 
VARIABLES 

The  final  factor  analysis  was  carried  out  on  the  cleaned  set  of  thirty 
performance  indicator  ratio  variables.  After  all  the  data  cleaning  and  verifying 
strategies,  after  all  the  exploratory  passes  through  the  data  and  after  all  the 
considerations  for  data  quality,  these  thirty  variables  v^ere  chosen  to  represent  the 
best  possible  information  on  performance  currently  available  in  the  Section  15  data 
base. 

Principal  component  factor  analysis  with  varimax  orthogonal  rotation  was 
carried  out  on  the  four  different  sets  of  thirty  performance  indicator  variables. 
Two  different  computer  routines  were  used — SPSS-PA  1^  and  BMDP-P4M^.  The 
latter  was  used  to  compare,  as  closely  as  possible,  the  current  analyses  with  the 
previous  work. 

The  patterns  of  factor  loadings  were  so  similar  between  the  reported  data, 
weighted  data  and  the  two  sets  of  logs  that  it  appeared  very  convincing  that  the 
underlying  structure  in  the  data  set  had,  indeed,  been  found.  Appendix  E  contains 
the  factor  pattern  matrices  for  the  final  factor  analyses. 

The  set  of  the  "best"  thirty  performance  indicators  was  analyzed  across  the 
four  different  data  sets.  With  the  exception  of  the  raw,  unweighted  data  exactly 
the  same  number  of  factors,  in  the  same  order,  and  portraying  the  same  factor 
loading  pattern,  emerged. 

Seven  factors,  accounting  for  approximately  85%  of  the  variance  emerged  from 
the  analysis.  Table  2-3  portrays  the  pattern  of  factor  loadings  for  the  final 
weighted  data  set.  Factors  One.  Two  and  Three  represent  output  per  dollar  cost, 
utilization  of  service  and  revenue  generation  per  expense,  respectively.  These  first 
three  factors  directly  relate  to  the  three  major  categories  of  the  performance 
concept  model — cost  efficiency,  service  effectiveness  and  cost-effectiveness 
outlined  by  Fielding,  et  al. 

Factors  Four,  Five  and  Six  represent  labor  efficiency,  vehicle  efficiency  and 
maintenance  efficiency  respectively.  Finally,  Factor  Seven  is  clearly  related  to 
safety.  Only  the  raw  data  set  portrayed  an  eighth  factor.  It  seemed  to  be  weakly 


^N.  H.  Nie,  C.  H.  Hull,  J.  G.  Jenkins.  K.  Steinbrenner  and  D.  H.  Bent,  SPSS, 
statistical  package  for  the  social  sciences.  (New  York:  McGraw-Hill,  1975.) 

^W.  3.  Dixon,  Ed.,  BMDP  statistical  software  1981.  (Los  Angeles:  University  of 
California  Press,  1981.) 
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related  to  pubic  assistance.  However,  the  raw  data  set  of  variables  had  been  based 
on  reported  information  alone,  without  disaggregating  the  multi-mode  information 
on  revenues  and  subsidies.  Thus,  the  weakly  defined  eighth  factor  appeared  to  be 
only  an  artifact  of  the  aggregated  data. 

VERIFYING  THE  FINAL  1980  FACTOR  ANALYSIS 

The  adequacy  and  strength  of  the  final  solution  were  determined  by  Thurstone's 

p 

five  criteria  for  detecting  simple  structure  solutions  in  factor  analysis  results  .  His 
criteria  are  as  follows: 

1.  There  should  be  at  least  one  zero  in  each  row  of  the  factor  loading  matrix. 

2.  If  m  common  factors  appear  in  the  structure,  each  column  of  the  factor 
loading  matrix  should  have  at  least  m  zeros. 

5.      For  every  pair  of  identified  factors  of  the  factor  loading  matrix: 

a.  there  should  be  several  variables  that  load  highly  on  one  of  the 
factors  and  minimally  on  the  other. 

b.  a  large  proportion  of  the  variables  should  load  minimally  on  both 
factors  (when  there  are  four  or  more  factors). 

c.  there  should  be  only  a  small  number  of  high  loading  variables  on 
both  factors. 

The  rotated  factor  loading  structure  was  compared  against  Thurstone's  criteria  for 
evaluating  structure  for  its  "simpleness"  and  met  each  of  the  qualifying  conditions. 
This  was  convincing  evidence  that  a  clear,  underlying  structure  in  the  data  had  been 
found. 

In  interpreting  and  portraying  the  factor  loading  pattern,  an  arbitrary  cut-off 

of  .5  had  been  used  as  a  factor  load  value.  The  high-loading,  i.e.,  representative 

variables  for  any  factor  were  identified  with  a  .5  factor  load,  but  .5  is  strictly  an 

arbitrary  choice.  Factor  loadings  of  .3  and  above  are  commonly  listed  among  those 

high  enough  to  provide  some  interpretative  value.  However,  values  of  .45  or  less, 

9 

generally  do  not  provide  a  very  good  basis  for  factor  interpretation.     It  was  felt 

^H.  H.  Harmon,  Properties  of  different  types  of  factor  solutions.  Modern 
factor  analysis.  (Chicago:  University  of  Chicago  Press,  1967),  pp.  97-99. 

^A.  L.  Comrey,  A  first  course  in  factor  analysis.  (New  York:  Academic  Press, 
1973). 
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that  a  high  cut-off  value  would  make  for  easier  and  clearer  interpretation  of  the 
factors. 

The  next  question  was:  How  much  of  the  variance  of  the  final  factor  solution 
was  not  being  accounted  for  by  the  identified  "high-loading"  variables.  The  data 
were  tested  by  regressing  the  high  loading  variables  against  the  full  set  of  variables 
representing  each  factor.  For  each  factor,  approximately  95%  of  the  information 
was  still  being  represented.  Overall,  86%  of  the  total  variance  of  the  original  factor 
structure  was  represented  in  the  subset  of  high-loading  variables. 

Reliability 

A  third  question  regarding  the  set  of  high-loading  variables  that  defined  the 
factor  structure  centered  on  the  reliability — in  a  statistical  sense — of  the  grouped 
variables.  Cronbach's  Alpha  was  calculated  for  each  group  of  variables  gathered 
together  on  a  particular  factor. 

Cronbach's  Alpha  can  be  used  to  evaluate  the  internal  consistency  of  a  group  of 
variables  to  see  if  they  essentially  target  the  same  underlying  information.^^  Alpha 
values  range  from  zero  to  one  with  a  value  equal  to  one  representing  perfect 
reliability,  or  internal  consistency  in  this  case.  An  alpha  value  of  .8  is  considered 
very  reliable. 

Standardized  Item  Alpha  was  calculated  for  each  group  of  high-loading 
variables  on  each  factor,  and  for  each  of  the  four  sets  of  slightly  different 
performance  indicators.  The  alpha  values  hovered  around  the  .8  criterion  on  the 
weighted  data  set  and  were  all  well  above  .8  on  the  log  set  of  the  weighted  data. 
This  was  true  on  all  factors  except  Factor  5  which  produced  an  uninterpretable 
alpha  value.  Factor  5  measures  the  positive  and  negative  poles  of  the  vehicle 
efficiency  concept  as  shown  in  the  negative  and  positive  factor  loadings.  Thus,  it 
confounds  the  calculation  of  standardized  item  alpha. 

Factor  Structure  Stability 

Two  final  questions  were  raised  regarding  the  1980  final  factor  analysis.  They 
both  focused  on  a  single  concern — how  "globally"  relevant  was  the  final  factor 
structure?  Would  the  underlying  structure  of  the  data  remain  stable  over  different 
theoretical  assumptions  or  an  increase  in  data  cases? 


Carmines  and  R.   A.  Zeller,  Reliability  and  validity  assessment. 
(Beverly  Hills,  Calif.:  Sage  Publications,  1979.) 
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Classical  inferential  factor  analyses  were  carried  out  on  the  four  performance 
indicator  variable  sets.  As  noted  previously,  this  type  of  analysis  assumes  that  the 
data  comes  from  a  random  sample  of  cases  from  a  larger  population.  All  solutions 
and  reported  statistics  are  mathematically  adjusted  to  predict  values  as  they  would 
exist  in  a  larger  population.  Thus,  it  is  conceivable  that  if  a  factor  structure  is 
somewhat  weakly  defined,  a  different  structure  could  emerge  from  an  inferential 
solution  than  from  a  principal  components  analysis.  However,  results  from  both  the 
inferential  and  principal  components  analyses  were  consistent  across  the  four  data 
sets. 

To  test  whether  the  final  structure  in  the  analyses  would  remain  stable  over  an 
increase  in  data  cases,  an  estimation  procedure  for  missing  data  was  used.  The 
BMDP  statistical  computing  package  includes  a  program  whereby  missing  data 
values  can  be  estimated.  Multiple  regression  on  the  variables  with  data  is  used  to 
predict  a  "most  likely  estimate"  for  any  case  missing  data  on  some  subset  of  the 
variables  in  the  analysis.  When  no  prediction  can  be  made  from  other  available 
data,  the  mean  of  the  variable  of  interest  is  used  to  replace  the  missing  value. 
When  any  case  is  missing  too  much  of  its  data,  it  is  not  used  in  the  estimation 
procedure. 

A  final  set  of  factor  analyses  was  carried  out  on  the  four  sets  of  performance 
indicators  where  missing  values  had  been  replaced  with  estimates.  The  number  of 
cases  then  being  analyzed  increased  from  194  to  280.  It  was  plausible  that  an 
increase  in  the  number  of  cases  being  analyzed  could  shift  a  weak  or  unstable  factor 
solution  to  a  different  factor  structure.  The  final  set  of  factor  analytic  solutions 
carried  out  from  the  data  sets  which  included  estimated  values  were  entirely 
consistent  with  the  earlier  results. 

Thus,  after  rigorous  testing  of  the  final  1980  factor  analysis,  it  was  found  that: 
1)  the  same  general  underlying  structure  had  consistently  appeared  across  all 
checking  routines;  2)  not  only  the  same  factors  appeared,  but  they  also  appeared  in 
the  same  order  and  3)  with  minor  fluctuations,  the  factor  loading  patterns  were 
generally  the  same.  Therefore,  it  was  concluded  that  a  stable,  consistent  and 
reliable  simple  structure  had  been  detected  out  of  the  larger  group  of  performance 
indicators. 
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COMPARISON  OF  1980  FINAL  FACTOR  ANALYSIS  TO  1979  ANALYSIS 

One  of  the  motivations  in  analyzing  this  data  in  this  way  was  to  provide  a 
comparison  with  the  previous  attempt  to  use  Section  15  data  for  performance 
evaluation. 

The  earlier  attempt  was  carried  out  on  the  first  year  (FY  1979)  data.  As  might 
be  expected,  there  were  many  more  problems  with  the  first  year  of  collected  data 
than  with  the  second  year  of  data.  The  former  data  set  was  fraught  with  missing 
data  problems,  imprecisely  reported  data,  and  less  careful  checking  procedures 
before  and  after  analysis. 

For  the  final  factor  analysis  on  the  1979  data,  one  set  of  raw  reported  data 
consisting  of  thirty-six  performance  indicator  variables  was  analyzed.  A  total  of 
ninety-eight  cases  (out  of  311)  were  in  the  analysis;  the  rest  dropped  out  due  to  the 
snowball  effects  of  missing  data.  Only  a  superficial  grooming  of  the  data  was  done. 
Thus,  many  erratic  values  and  questionable  zeros  remained  in  the  data. 

For  the  final  factor  analysis  on  the  1980  data,  four  sets  of  similar  data 
consisting  of  thirty  performance  indicator  variables  were  analyzed.  The  data  was 
carefully  groomed  for  accidental  or  inconsistent  values  and  strategies  were 
developed  to  differentiate  valid  zeros  from  "missing  data  zeros."  All  in  all,  there 
was  much  greater  confidence  in  the  1980  data  set  by  the  time  the  current  set  of 
factor  analyses  was  begun  than  was  possible  for  the  1979  data  set. 

Comparison  of  the  two  final  factor  structures — from  the  FY  1979  data  analysis 
and  from  the  FY  1980  data  analysis — shows  that  the  same  first  two  factors  emerge 
in  the  same  order  in  both  years.  Output  per  dollar  cost  and  utilization  of  service  are 
Factors  One  and  Two  respectively  for  both  factor  analyses.  Since  the  first  few 
factors  usually  account  for  a  large  amount  of  the  total  variance  in  the  data  set,  it 
was  clear  that  the  first  two  key  features  of  performance  had  been  identified  in  both 
years.  Appendix  E  also  includes  the  factor  loading  matrix  resulting  from  the 
FY  1979  data  analysis. 

From  that  point  on,  the  factor  structures  diverged  across  years.  The  remaining 
seven  factors  from  the  total  of  nine  factors  in  the  earlier  analyses  were  as  follows: 
vehicle  efficiency,  fuel  efficiency,  public  assistance,  social  effectiveness,  mainten- 
ance efficiency,  revenue  per  expense  and  safety.  Because  the  set  of  performance 
indicators  used  in  the  analyses  had  differed  across  years,  it  was  difficult  to  compare 
the  two  any  further. 

Fuel  efficiency  and  social  effectiveness  related  variables  had  been  dropped  in 
the  current  analysis.   The  former  did  not  lend  themselves  to  valid  cross  system 


47 


comparisons  and  the  latter  were  not  valid  when  based  on  other  than  service  area 
population.  Thus,  the  two  data  sets  differed  sonnewhat  in  the  variables  used  for  the 
analyses. 

In  the  1979  data,  weighting  strategies  had  not  been  used  to  disentangle  the 
aggregated  revenue  and  subsidy  information.  Thus,  variables  relevant  to  those  areas 
were  clearly  contaminated  and  invalid  for  cross-system  single  mode  analyses.  The 
pattern  of  variation  in  such  variables  would  have  clearly  been  different  from  the 
other  variables  in  the  analysis,  and  the  identification  of  a  public  assistance  factor  in 
the  earlier  analysis  attests  to  that  fact. 

The  1979  analysis,  when  compared  with  the  current  set  of  analyses,  shows  that 
the  underlying  structures  are  not  so  different,  but  that  the  two  data  sets  from  which 
the  analyses  began  were  clearly  different.  In  the  current  research  there  was  a  great 
deal  more  confidence  concerning  the  variables  chosen  and  especially  regarding  the 
quality  of  the  data  itself.  It  was  strongly  felt  that  the  1980  data  analyses  had,  in 
fact,  detected  the  key  underlying  concepts  of  performance  for  this  data.  The 
increase  in  number  of  cases  analyzed,  the  many  analyses  on  the  four  parallel  sets  of 
data,  and  finally,  the  rigorous  verifying  and  validating  procedures  provided  a  great 
deal  of  confidence  in  the  final  results. 

Further,  the  fact  that  both  years  of  data  had  detected  many  of  the  same 
concepts,  despite  the  poorer  quality  of  the  1979  data,  provided  stronger  validation 
for  the  conceptual  model  of  transit  performance.  However,  the  final  structures 
detected  with  the  FY  1979  and  FY  1980  data  were  different.  The  order  in  which 
factors  emerged  from  the  data  was  not  the  same.  This  was  partly  due  to  the  use  of 
somewhat  different  sets  of  performance  indicator  variables  and  partly  the  result  of 
using  the  much  cleaner  and  more  complete  set  of  FY  1980  data.  Since  the  1980  data 
had  been  so  carefully  cleaned  and  verified,  it  was  evident  that  in  the  current 
analyses  not  only  the  underlying  concepts  had  been  detected,  but  their  relative 
importance  to  each  other  and  across  the  larger  set  of  available  data  had  also  been 
determined. 

SELECTING  REPRESENTATIVE  MARKER  VARIABLES 

A  result  of  this  research  was  the  establishment  of  a  small,  unique  subset  of 
performance  indicators  that  are  particularly  useful  for  performance  evaluation  by 
individual  transit  managers  for  systems  of  all  sizes.  The  goal  was  to  identify  the 
minimum  amount  of  data  necessary  to  convey  the  maximum  amount  of  performance 
information. 
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To  accomplish  this,  the  factor  loading  data  in  the  rotated  factor  structure 
solutions  on  the  final  variable  sets  were  used.  High  factor  loadings  represent  a  high 
correlation  of  a  particular  variable  with  a  particular  factor.  When  a  variable  has  a 
high  factor  loading  on  only  one  factor,  it  can  be  said  to  "represent"  that  factor  both 
statistically  and  conceptually. 

To  select  a  small  subset  of  easily  accessible  performance  indicators  from  the 
final  factor  structure  five  criteria  were  used:  1)  Representativeness  of  a  variable 
vis-a-vis  a  factor  was  reflected  in  a  high  factor  loading  on  only  one  factor.  2)  The 
distribution  of  values  in  the  variable  had  to  be  as  close  to  normal-like  as  possible. 
3)  Ease  of  collection  of  the  variable  was  assessed  via  the  percentage  of  data 
missing.  4)  The  variable  had  to  have  been  well  captured  by  the  factor  structure  in 
general  (high  communality).  5)  The  variable  selected  had  to  be  easily  understood  by 
transit  managers. 

Seven  representative  or  "marker"  variables  were  selected  from  the  final  factor 
structure — one  variable  representing  each  factor.  Seven  "alternate  markers"  were 
also  identified.  These  alternates  could  be  used  equally  well  for  assessing 
performance.  The  seven  representative  "marker"  variables  and  their  alternates  are 
listed  in  Table  2-4  and  2-5  respectively. 

TABLE  2-4.  "MARKER"  VARIABLES  BEST  REPRESENTING 
THE  UNDERLYING  PERFORMANCE  CONCEPTS 

PERFORMANCE         BEST  "MARKER"  FOR  PERFORMANCE 
FACTOR      CONCEPT  INDICATOR  CONCEPT 


Output  per  (RVH/OEXP)  Revenue  Vehicle  Hour  per 

$  Cost  Operating  Expense 

Utilization  of  (TPAS/RVH)  Unlinked  Passenger  Trips 

Service  per  Revenue  Vehicle  Hour 

Revenue  Generation  (OREV/OEXP)  Operating  Revenue  per 

per  Expense  Operating  Expense 

Labor  Efficiency  (TVH/EMP)    Total    Vehicle    Hours    per  Total 

Employees 

Vehicle  Efficiency  (TVM/PVEH)    Total    Vehicle    Miles    per  Peak 

Vehicle 

Maintenance  (TVM/MNT)  Total  Vehicle  Miles  per 

Efficiency  Maintenance  Employee 

Safety  (TVM/ACC)  Total  Vehicle  Miles  per  Accident 
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TABLE  2-5.  BEST  SET  OF  "MARKER"  VARIABLES 
AND  THEIR  ALTERNATES 


FACTOR  BEST  "MARKER" 
INDICATOR 


GOOD 


ALTERNATE 


PERFORMANCE 


6 
7 


RVH/OEXP 

TPAS/RVH 

OREV/OEXP 

TVH/EMP 

TVM/PVEH 

TVM/MNT 

TVM/ACC 


(TVM/OEXP)  Total  Vehicle  Miles  per  Operating 
Expense 

(TPAS/RVM)  Unlinked  Passenger  Trips  per 
Revenue  Vehicle  Mile 

(REV/OSUB)  Operating  Revenue  per  Operating 
Subsidy 

(RVH/OEMP)  Revenue  Vehicle  Hours  per 
Operating  Employee 

(TVH/PVEH)  Total  Vehicle  Hours  per  Peak 
Vehicle 

(PVEH/MNT)  Peak  Vehicle  per  Maintenance 
Employee 

(RVH/ACC)  Revenue  Vehicle  Hours  per  Accident 


The  first  three  factors  account  for  about  55%  of  the  variance  in  the  data.  This 
demonstrates  that  for  a  quick  performance  evaluation,  the  first  three  "markers" 
would  suffice.  This  small  subset  of  statistics  also  provides  information  for  each 
dimension  of  the  performance  concept  model  discussed  previously.  Thus,  by  using 
only  three  key  statistics,  a  transit  analyst  could  target  the  most  salient  performance 
concepts  for  individual  and  cross-sectional  transit  agency  analysis. 

The  markers  and  the  alternate  set  of  markers  are  highly  reliable  (alpha  range  is 
from  .802  to  .937).  Thus,  with  a  maximum  of  seven  variables  from  a  much  larger 
data  set,  the  performance  of  a  transit  system  can  be  evaluated.  To  assess  the  three 
major  categories  represented  in  the  Fielding,  et  al.,  conceptual  model,  the  first 
three  "marker"  variables  would  be  sufficient.  Further,  any  one  of  the  seven  factor 
concepts  identified  could  be  assessed  by  means  of  the  relevant  "marker"  variable. 


WHO  IS  NOT  WELL  REPRESENTED  IN  THE  FACTOR  ANALYSIS? 

The  FY  1980  Section  15  data  is  somewhat  biased  toward  the  larger  systems. 
Although  one-third  of  the  systems  reporting  have  twenty-five  and  under  vehicles,  it 
is  this  group  which  is  consistently  missing  the  largest  percentage  of  its  data. 
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Approximately  16%  of  this  group's  vehicle  miles  or  vehicle  hours  data,  59%  of  its 
passenger  data  and  9%  of  its  maintenance  expense  data  is  missing.  In  the  final  set 
of  thirty  performance  indicator  variables  used  in  the  factor  analysis  the  small 
system  group  was  missing  from  7  %  to  37%  of  its  data.  Thus,  the  small  systems 
group  was  not  well  represented  in  the  factor  analysis. 

This  could  have  introduced  a  bias  in  the  final  solution.  However,  when  the 
estimation  of  missing  values  procedure  was  used  on  the  data,  the  factor  structure 
that  emerged  was  consistent  with  other  results.  Therefore,  it  was  concluded  that 
the  final  factor  structure  would  remain  stable  even  with  increased  representation 
from  the  smaller  systems. 

CONCLUSION 

The  FY  1980  Section  15  data  has  been  used  to  identify  and  test  the  most  easily 
accessible  and  parsimonious  set  of  performance  indicators  for  fixed  route  transit. 
The  research  had  two  objectives:  first  to  find  the  minimum  amount  of  data 
necessary  to  provide  solid  and  stable  performance  evaluation  capability,  and  second 
to  test  the  validity  of  results  obtained  from  the  previous  analysis  of  FY  1979  data. 

The  use  of  factor  analysis  on  a  large  set  of  performance  indicator  ratios 
gleaned  from  the  data  the  structure  of  the  key  underlying  performance  concepts. 
From  the  factor  structure,  a  small  subset  of  seven  variables  was  identified  and 
tested  against  the  larger  data  structure.  These  seven  variables  are  the  most  salient 
performance  indicators  currently  available  in  the  Section  15  data  base.  They  can  be 
used  together  or  individually  to  assess  fixed  route  transit  performance. 

There  is  a  great  deal  of  confidence  in  the  data  used  and  in  the  final  results. 
Rigorous  cleaning,  verifying  and  grooming  procedures  carried  out  before  analysis 
insured  that  the  input  data  was  as  complete  as  possible.  Careful  decisions  regarding 
which  variables  to  keep  and/or  drop  from  the  analysis  provided  the  best  possible  set 
of  performance  indicators  available  for  cross-sectional  analysis  in  the  Section  15 
data.  The  use  of  four  parallel  data  sets  and  several  exploratory  factor  analyses 
detected  the  simple  underlying  structure  of  the  data.  Finally,  the  rigorous  testing 
and  validation  of  that  underlying  factor  structure  was  convincing  that  the  most 
salient  performance  indicator  concepts  had  been  found.  The  strongly  consistent  and 
stable  structure  in  the  data  led  to  identification  of  the  key  variables  for  evaluation. 
These  too  measured  up  to  testing  and  verifying  procedures.  Given  the  quality  of  the 
Section  15  data  at  hand  it  is  felt  that  the  most  salient  features  for  performance 
evaluation  have  been  determined. 


51 


A  globally-relevant  set  of  performance  indicators  has  been  detected.  These 
variables  can  be  used  for  peer  group  comparisons  because  variables  that  were 
problematic  for  such  comparisons  were  detected  then  dropped  from  the  analysis 
(e.g.,  fuel  efficiency  and  social  effectiveness  variables,  are  not  given  to 
cross-system  analysis). 

The  strength  of  this  research  lies  in  both  the  quality  of  the  data  used  and  the 
rigor  with  which  the  results  were  tested.  A  relevant  set  of  performance  concepts 
has  been  identified  and  linked  to  easily  accessible  "marker"  variables  which  can  be 
used  for  cross-system  assessment  within  peer  groups  defined  by  characteristics  of 
transit  operations  in  the  next  chapter. 
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CHAPTER  3 
PEER  GROUP  FORMATION  AND  USE 


INTRODUCTION 

Using  performance  indicators  for  comparison  requires  the  clustering  of  similar 
systems  into  groups,  otherwise  comparisons  are  misleading.  There  are  many 
different  ways  of  clustering  transit  systems — by  size,  by  mode,  by  state,  etc.  This 
chapter  describes  a  method  in  which  transit  systems  are  clustered  by  operating 
characteristics — by  size,  peak  to  base  requirements  and  speed.  Twelve  peer  groups 
are  established  and  compared  with  peer  groups  established  using  the  seven 
performance  variables  defined  in  Chapter  2.  Peer  groups  defined  by  operating 
variables  were  found  to  be  superior  for  performance  analysis.  Transit  agencies 
clustered  into  these  twelve  groups  can  be  reliably  compared  using  the  seven 
performance  indicators  defined  in  Chapter  2.  These  peer  groups  have  stability  over 
time. 

Comparison  of  performance  of  transit  systems  and  discussion  of  changes  in  the 
transit  industry  across  years  is  facilitated  by  comparison  of  systems  which  are 
similar  in  their  operating  characteristics.  Analysts  and  policy  makers  can  be  misled 
by  comparing  performance  of  systems  which  are  essentially  unlike  one  another. 
Construction  of  peer  groups  of  transit  systems  allows  individual  systems  to  be 
compared  to  others  which  are  similar,  rather  than  with  systems  which  differ  in  their 
operating  environments.  In  addition,  the  relationship  between  operating 
characteristics  and  performance  can  be  examined  by  focusing  on  differences  in 
performance  across  peer  groups  with  different  operating  characteristics.  Finally, 
transit  industry  changes  across  years  can  be  viewed  in  relation  to  operating 
characteristics  of  systems  by  comparing  peer  groups. 

TYPOLOGY  FOR  TRANSIT 

Separating  transit  systems  into  peer  groups  which  share  similar  operating 
characteristics  is  analogous  to  separating  any  set  of  objects  into  a  small  number  of 
groups  in  which  members  of  the  same  group  are  more  similar  to  each  other  than  to 
objects  in  other  groups,  and  the  groups  differ  from  one  another.  Problems  of  this 
sort  are  common  in  the  social  and  biological  sciences  and  in  applied  settings  such  as 
marketing  research.  One  example  of  the  application  of  such  analysis  in  marketing 
research  is  the  clustering  of  neighborhoods  based  on  demographic  characteristics 
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from  census  data  as  the  basis  for  targeting  market  segments.  In  biological  sciences 
researchers  often  use  cluster  analysis  as  an  aid  to  classifying  plants  or  animals  into 
clusters  based  on  their  anatomical  similarity.  The  results  of  such  analyses  are  the 
assignment  of  each  object  to  one  and  only  one  of  the  groups  or  clusters. 

The  initial  question  in  the  formation  of  peer  groups  of  transit  agencies  based  on 
operations  is  how  operating  environment  is  to  be  measured.  Ideally  one  would  use 
demographic  variables  such  as  service  area  population  density  to  determine  the 
operating  environment  of  each  system.  However,  since  these  are  not  available  in  a 
form  compatible  with  the  level  of  reporting  in  the  Section  15  data,  four  differen- 
tiating variables  were  chosen  to  measure  inherent  differences  in  operations.  These 
variables  are:  total  vehicle  miles,  number  of  peak  vehicles,  speed  and  peak  to  base 
ratio.  Each  reflects  some  aspect  of  the  operating  environment  within  which  a 
transit  system  operates.  Total  vehicle  miles  and  number  of  peak  vehicles  measure 
the  overall  size  of  the  system.  Total  vehicle  miles  relates  to  maintenance  and 
capital  needs  of  the  transit  system  because  it  measures  the  actual  usage  of 
vehicles.  Peak  vehicles  reflect  the  daily  maximum  capacity  of  the  system  and  the 
resultant  labor  needs  in  terms  of  drivers  and  management.  Differences  in  speed 
capture  the  difference  between  urban  and  suburban  systems.  Peak  to  base  ratio 
indicates  the  degree  to  which  a  system  is  oriented  to  peak  service.  In  the  absence 
of  demographic  data  directly  measuring  service  area  characteristics  like  population 
density,  household  income  and  trip  patterns,  these  variables,  which  are  available  in 
the  Section  15  data,  tap  important  variations  in  operating  characteristics  of  transit 
systems. 

Formation  of  peer  groups  requires  grouping  together  agencies  which  have 
similar  profiles  across  these  four  operating  variables.  For  example,  two  agencies 
which  are  both  large,  slow  and  have  high  peak  to  base  ratios  should  be  placed  in  the 
same  peer  group,  whereas  systems  which  are  small  and  fast  should  be  assigned  to  a 
different  peer  group.  The  goal  is  to  construct  peer  groups  so  that  agencies  within  a 
group  are  similar  to  each  other,  and  different  from  agencies  in  other  peer  groups. 
On  the  average  agencies  in  one  peer  group  will  have  profiles  of  operations  which  are 
distinct  from  those  of  other  groups. 

Cluster  Analysis 

Several  data  analysis  methods  exist  for  such  analysis,  including  cluster  analysis, 
multidimensional  scaling,  and  Q  factor  analysis.  In  this  research  cluster  analysis 
was  chosen  as  the  analytic  tool  for  constructing  peer  groups  because  in  contrast  to 
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multidimensional  scaling  and  Q  factor  analysis,  cluster  analysis  provides  a  grouping 
of  the  objects  into  a  number  of  distinct  groups,  and  cluster  analysis  routines  are 
available  which  handle  a  large  number  of  cases,  such  as  are  present  in  the  Section  i  5 
data.  Cluster  analysis  is  a  technique  ideally  suited  for  forming  peer  groups  because 
it  provides  an  objective  means  for  defining  how  similar  objects  are  and  an  objective 
means  for  forming  peer  groups  based  on  these  similarities. 

Cluster  analysis  is  a  general  term  referring  to  a  large  number  of  procedures 
which  have  in  common  the  goal  of  constructing  groups  of  items  (either  data  cases  or 
variables)  based  on  their  similarity  across  a  profile  of  observations.  The  result  of  a 
cluster  analysis  is  the  formation  of  a  number  of  groups  of  items  and  the  assignment 
of  each  item  to  one  of  these  groups.  A  summary  of  many  of  the  techniques  for 
doing  cluster  analysis  can  be  found  in  Everitt. 

Cluster  analysis,  and  similar  techniques  which  construct  groupings  of  the  data, 
differ  from  methods  such  as  discriminant  analysis  which  attempt  to  classify  objects 
into  known  groups.  The  latter  type  of  analyses  are  different  from  cluster  analysis  in 
that  they  require  that  the  groups  be  known  in  advance,  whereas  cluster  analysis 
constructs  the  groups. 

The  most  common  and  frequently  used  clustering  methods  are  "hierarchical" 
clustering  methods.  Such  procedures  form  clusters  in  a  series  of  steps.  The  most 
common  of  these  methods  begins  with  each  object  belonging  in  a  cluster  by  itself, 
and  each  step  joins  two  clusters  from  the  previous  step  into  one  more  inclusive 
cluster.  The  procedure  continues,  joining  clusters  at  each  step  until  at  the  final  step 
all  objects  are  joined  into  one,  all  inclusive,  cluster.  At  each  step  in  the  process, 
cases  which  are  relatively  more  similar  to  each  other  will  be  in  the  same  group. 
Since  a  hierarchical  clustering  solution  provides  a  series  of  groupings  from  one  in 
which  each  case  is  an  individual  cluster,  to  one  in  which  all  cases  are  joined  into  the 
same  cluster,  the  researcher  must  choose  a  level  in  the  hierarchichal  series  of 
clusters  which  provides  a  useful  and  meaningful  number  of  clusters  of  the  data. 

The  decision  as  to  the  number  of  clusters  present  in  the  data  is  an  important 
issue  in  any  cluster  analysis.  Some  clustering  methods  provide  a  single  partition  of 
the  data  into  a  pre-specified  number  of  groups.  The  K -means  procedure  discussed 
below  is  an  example  of  such  a  procedure.  However,  most  hierarchical  procedures 
provide  a  series  of  clusters,  from  least  to  most  inclusive.   The  researcher  must 


B.  Everitt,  Cluster  Analysis.  (London:  Heinemann,  1980.) 
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decide  which  set  of  clusters  provides  the  most  meaningful  and  useful  grouping  of  the 
data. 

DESCRIPTION  OF  CLUSTERING  TECHNIQUES 

In  forming  the  peer  groups  of  transit  systems,  three  hierarchical  clustering 
techniques  were  used  in  order  to  insure  that  the  final  results  were  not  simply  a 
function  of  the  particular  technique  which  was  chosen.  In  this  section  these  three 
techniques  are  explained. 

The  most  important  feature  which  distinguishes  among  clustering  techniques  is 

the  rule  by  which  items  are  included  in  a  cluster,  and  by  which  clusters  are  joined 

together.  Many  rules  exist  for  doing  this.  The  three  different  methods  used  in  this 

research  were:  single  link,  centroid  and  K-means  clustering.  These  differ  in  the  rule 

used  for  forming  clusters.    Descriptions  provided  here  are  fairly  basic,  and  the 

2 

reader  who  would  like  more  detailed  descriptions  is  referred  to  Dixon. 

Single  Link  Clustering 

Single  link  clustering  is  a  hierarchical  clustering  technique.  The  procedure 
starts  with  information  about  the  similarity  (or  dissimilarity)  among  all  pairs  of 
items  to  be  clustered.  In  the  current  analysis,  the  input  was  the  dissimilarity 
between  pairs  of  transit  agencies  based  on  their  operating  characteristics.  The 
single  link  method  starts  initially  with  each  case,  here  a  transit  agency,  as  a  distinct 
cluster.  The  analysis  proceeds  through  a  series  of  steps,  at  each  step  combining  two 
clusters  (or  individual  cases)  to  form  a  larger  cluster.  The  criterion  used  to  join 
clusters  is  that  the  two  clusters  are  joined  which  have  the  smallest  difference 
linking  any  single  member  of  one  cluster  with  any  single  member  of  the  other 
cluster.  In  other  words,  the  two  individual  cases  in  different  clusters  which  are 
most  similar  cause  their  respective  clusters  to  be  joined.  The  process  continues 
until  all  cases  are  joined  into  a  single,  all  inclusive  cluster. 

Centroid  Clustering 

The  centroid  method  is  similar  to  the  single  link  clustering  method  in  that  it 
proceeds  by  joining  clusters  (or  cases)  in  a  series  of  steps,  however  it  differs  in  the 
rule  it  uses  to  join  the  clusters.  The  centroid  method  assigns  cases  to  clusters,  or 


2w.J.  Dixon,  Ed.,  BMDP  Statistical  Software  1981.  (Los  Angeles:  University  of 
California  Press,  1981.) 
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joins  clusters  together,  on  the  basis  of  the  distance  between  a  case  and  the  center  of 
a  cluster,  or  the  distance  between  the  centers  of  two  clusters.  At  the  initial  stage  of 
the  centroid  clustering  procedure  each  case  is  a  single  cluster.  At  each  pass  of  the 
clustering  process  the  two  clusters  which  are  closest  together  are  joined  to  form  a 
new  cluster.  This  process  continues  until  at  the  final  step  all  cases  are  joined  into  a 
single  cluster.  The  closeness  between  clusters,  which  is  used  as  the  basis  for  joining 
clusters,  is  the  Euclidean  distance  between  the  locations  of  the  clusters.  The 
location  of  a  cluster  is  based  on  its  values  on  the  original  variables  in  the  analysis 
(combined  across  the  members  of  the  cluster).  When  two  cases  or  clusters  are 
identical  in  their  values  on  the  variables,  the  distance  between  them  will  be  zero. 
When  two  cases  or  clusters  are  quite  different  in  their  values,  the  distance  between 
them  will  be  quite  large. 

When  cases  are  combined  into  a  new  cluster  the  location  of  this  new  cluster  on 
the  variables  is  computed  by  taking  the  average  of  the  values  on  each  of  the 
variables,  weighted  by  the  number  of  cases  in  the  cluster.  This  location  is  called  the 
centroid  of  the  cluster,  and  is  used  for  computing  the  distance  from  that  cluster  to 
other  clusters. 


K-means  Clustering 

K-means  clustering  is  similar  to  centroid  clustering  in  many  respects,  however, 
rather  than  joining  small  clusters  to  form  larger  ones,  this  method  divides  large 
clusters  into  smaller  ones.  This  is  often  referred  to  as  divisive  rather  than 
agglomerative  clustering.  In  addition,  this  procedure  only  reports  solutions  for 
previously  specified  number  of  clusters.  The  K-means  program  begins  with  all  cases 
in  one  cluster  and  then  at  each  step  in  the  clustering  procedure  divides  a  cluster  into 
two  smaller  clusters.  Clusters  are  divided  on  the  basis  of  the  distance  between  their 
centers.  (See  the  discussion  of  centroid  clustering  for  a  definition  of  a  cluster's 
center.)  The  division  of  large  clusters  into  smaller  ones  proceeds  until  a  prespecified 
number  of  clusters  is  produced.   The  final  step  in  the  K-means  procedure  is  the 


'The  Euclidean  distance  between  two  single  cases  (i  and  j)  defined  across  the 
variables  (k)  is: 

2  1/2 
ij    y  ik      jk'  ' 

Where  Xj;^  is  the  value  for  case  i  on  variable  k  and  Xj|^  is  the  value  for  case  j  on 
variable  k. 
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reevaluation  of  the  cluster  assignment  of  each  case,  and  the  reassignment  of  cases 
to  new  clusters,  if  another  cluster  is  closer  than  the  original. 

One  of  the  drawbacks  of  the  K-means  procedure  is  the  need  to  specify  the 
number  of  clusters  the  program  is  to  report.  This  is  a  problem  for  exploratory  data 
analysis  since  the  number  of  clusters  present  in  the  data  usually  is  not  known. 

General  Issues  in  Cluster  Analysis 

Two  issues  are  common  to  all  of  these  clustering  methods;  first,  the  decision  as 
to  the  number  of  clusters  in  the  data,  and  second,  the  problem  of  how  to  combine 
the  information  about  cases  into  a  measure  of  the  similarity  or  dissimilarity  among 
cases. 

There  is  no  fixed  rule  for  deciding  the  number  of  clusters  present  in  the  data. 
In  the  most  extreme  case  each  item  could  be  assigned  to  its  own  individual  cluster, 
or,  on  the  other  hand,  all  cases  could  be  combined  into  one  all  inclusive  cluster.  The 
issue  is  to  choose  a  point  in  the  series  of  clusters,  from  least  to  most  inclusive, 
which  provides  a  useful  and  meaningful  grouping  of  the  data. 

Choice  as  to  the  number  of  clusters  is  made  in  view  of  the  substantive  research 
problem  and  the  group  structure  of  the  items  being  clustered.  In  the  current  context 
the  problem  is  to  choose  a  number  of  peer  groups  of  transit  agencies  so  that  there 
are  enough  groups  to  capture  the  major  differences  among  agencies,  but  so  that 
there  are  not  so  many  groups  that  the  fine  grained  distinctions  among  them  are  not 
useful.  In  addition,  it  is  important  to  have  groups  which  are  neither  too  small,  so 
that  a  given  agency  has  few  peers,  nor  too  large,  so  that  members  of  a  group  differ 
greatly  from  each  other.  Complex  statistical  procedures  exist  for  making  this 
decision,  but  were  not  used  in  this  research.  In  single  link  and  centroid  clustering 
solutions,  the  decision  about  the  number  of  clusters  is  made  after  one  views  the 
results  of  the  analysis.  However,  in  the  K-means  clustering  the  number  of  clusters 
must  be  specified  prior  to  the  analysis. 

The  second  issue  which  is  common  to  clustering  analyses  is  how  to  measure  the 
similarity  among  the  items  to  be  clustered.  In  the  current  analysis  274  transit 
agencies  had  complete  data  on  each  of  the  four  operating  characteristics:  total 
vehicle  miles,  number  of  peak  vehicles,  speed  and  peak  to  base  ratio.  The  question 
is,  how  should  the  information  on  these  variables  be  combined  to  measure  how 
similar  transit  agencies  are  to  each  other  in  their  operations.  Two  issues  need  to  be 
considered  in  arriving  at  the  measure. 
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First,  these  four  variables  are  measured  on  quite  different  scales.  For  exannple, 
total  vehicle  miles  (in  10,000's)  range  from  1  to  10869,  while  peak  to  base  ratios 
range  from  .6  to  5.0.  If  one  were  to  combine  these  into  a  single  measure,  the 
differences  on  variables  with  large  values  (for  example  total  vehicle  miles)  would 
swamp  the  differences  on  variables  with  small  values  (for  example  peak  to  base 
ratios),  giving  total  vehicle  miles  extra  weight  in  the  calculation.  In  order  to 
overcome  this  problem  it  is  necessary  to  express  all  variables  on  the  same  scale. 
This  is  done  by  standardizing  all  variables  by  transforming  them  to  Z  scores.  The 
mean  of  a  variable  is  subtracted  from  each  value  on  the  variable,  and  then  the  value 
is  divided  by  the  standard  deviation  of  the  variable.  The  resulting  standardized 
variables  all  have  means  of  zero  and  standard  deviations  of  one. 

The  second  question  is  how  to  combine  four  measures  on  operations  for  each 
agency  into  one  measure  of  dissimilarity  between  each  pair  of  agencies.  In  this 
research  the  dissimilarity  between  agencies  was  measured  by  taking  the  Euclidean 
distance  between  cases  across  the  four  standardized  operating  characteristics.  The 
formula  for  Euclidean  distance  is  given  in  footnote  3,  above. 

PEER  GROUPS  BASED  ON  OPERATING  CHARACTERISTICS 

This  section  describes  the  formation  of  peer  groups  of  transit  agencies  based  on 
their  operating  characteristics.  The  final  peer  groups  were  formed  using  centroid 
method,  hierarchical  clustering.  Single  link  and  K-means  clustering  were  also  used 
to  group  the  agencies  into  clusters;  however,  these  groups  were  judged  to  be  less 
satisfactory  than  those  produced  by  the  centroid  method. 

Formation  of  Peer  Groups  using  Centroid  Clustering 

The  twelve  peer  groups  of  fixed  route,  motor  bus  systems  were  defined  on  the 
basis  of  four  operating  variables:  total  vehicle  miles,  number  of  peak  vehicles,  speed 
and  peak  to  base  ratio.  Centroid,  hierarchical  clustering,  as  implemented  in  the 
BMDP  package  of  statistical  analysis  programs,  was  used  to  form  the  clusters.  The 
centroid  cluster  analysis  included  27A  of  the  30A  transit  agencies  in  the  FY  1980 
Section  1 5  data.  The  remaining  50  agencies  were  missing  data  on  one  or  more  of  the 
four  operating  variables  and  were  excluded  from  the  analysis.  All  variables  were 
standardized  to  Z  scores  (as  described  above)  prior  to  analysis.  The  closeness  of 
clusters  was  measured  using  the  Euclidean  distance  between  their  locations. 

The  analysis  produced  a  hierarchical  series  of  clusters.  Inspection  of  the  final 
solution  indicated  that  there  were  twelve  clear  clusters  of  transit  agencies,  and 
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that  two  agencies  could  not  be  assigned  to  any  cluster.  Peer  group  assignments  for 
the  27A  transit  agencies  are  in  Appendix  I. 

The  results  of  this  method  of  clustering  were  superior  to  any  of  the  other 
clustering  methods  since  it  produced  distinct  clusters  of  moderate  size  and  assigned 
all  but  two  of  the  agencies  to  clusters.  The  final  solution  provided  twelve  peer 
groups  ranging  in  size  from  2  to  78  members.  This  was  judged  to  be  a  more 
satisfactory  division  of  transit  agencies  than  that  resulting  from  either  single  link  or 
K-means  clustering. 

Single  Link  Clustering 

Two  clustering  analyses  were  done  using  single  link  clustering.  One  was  done 
with  all  four  operating  variables,  (total  vehicle  miles,  number  of  peak  vehicles, 
speed  and  peak  to  base  ratio)  and  a  second  analysis  was  done  using  three  of  these 
variables  (excluding  total  vehicle  miles).  The  single  link  analysis  using  the  four 
variables  included  274  of  the  304  systems.  The  remaining  thirty  systems  had  data 
missing  on  one  or  more  of  the  four  operating  measures  and  were  excluded  from  the 
analysis.  The  single  link  analysis  with  three  variables  included  275  of  the  agencies. 

Neither  the  single  link  clustering  with  four  operating  variables,  nor  the  one  with 
three  variables  produced  a  useful  or  meaningful  grouping  of  the  transit  agencies.  At 
an  intermediate  level  in  the  clustering  analysis  with  four  variables  there  were  34 
separate  clusters.  One  cluster  contained  78  agencies,  a  second  cluster  contained  41 
agencies  and  the  remaining  32  clusters  were  guite  small  with  between  2  and  10 
members.  There  was  no  other  point  in  the  set  of  clusters  which  provided  a  better 
grouping  of  the  data. 

The  single  link  analysis  with  three  variables  also  failed  to  provide  a  useful  set 
of  clusters.  The  results  were  similar  to  the  analysis  with  four  variables.  At  an 
intermediate  level  there  were  30  clusters,  one  containing  78  cases,  one  with  20  and 
the  others  with  between  2  and  10  cases  each. 

Both  of  these  clustering  solutions  exhibit  a  problem  which  is  common  in  single 
link  clustering,  called  "chaining".  In  such  a  result  cases  are  added  one  after  another 
to  a  single  cluster,  rather  than  being  placed  in  a  number  of  distinct  clusters.  This 
does  not  provide  a  useful  grouping  of  the  data. 

K-means  Clustering 

The  K-means  clustering  procedure  was  used  employing  all  four  operating 
variables.  Two  solutions  were  produced,  one  with  ten  and  one  with  twelve  clusters. 
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These  values  were  chosen  because  they  were  in  the  range  of  the  number  of  clusters 
produced  by  the  centroid  method. 

The  ten  cluster  solution  produced  three  groups  with  only  one  or  two  members  in 
each  and  an  additional  three  groups  with  more  than  60  members  each.  Seventy-two 
percent  of  the  agencies  fell  into  one  of  these  three  large  groups.  The  remaining 
four  clusters  had  between  14  and  24  cases  each.  Clusters  which  are  quite  large  or 
quite  small  are  not  useful  for  practical  purposes,  therefore  this  analysis  was 
rejected  as  the  basis  for  defining  the  peer  groups. 

The  solution  with  twelve  clusters  slightly  reduced  the  sizes  of  the  three  large 
groups  from  the  ten  cluster  analysis  by  forming  new  groups  or  placing  cases  from 
these  large  groups  into  other  groups.  However,  each  of  these  three  large  groups  still 
had  more  than  56  members  each.  In  addition,  there  were  four  groups  with  only  one 
or  two  members  each.  Several  factors  lead  us  to  reject  this  as  our  final  solution  for 
definition  of  the  peer  groups.  Most  importantly,  the  need  for  prior  specification  of 
the  number  of  clusters  present  in  the  data  presumes  that  one  knows  in  advance  the 
number  of  peer  groups  present  in  the  sample  of  transit  agencies.  Second,  the  groups 
which  are  produced  using  this  method  seem  to  be  quite  sensitive  to  a  few  cases 
which  have  extreme  or  unusual  values  on  a  single  variable.  That  is,  clusters  with 
very  few  members  are  formed  to  accomodate  cases  which  have  extreme  values  on 
one  variable  thus  forcing  the  other  cases  to  be  lumped  into  a  few  large  clusters. 

Comparison  of  the  twelve  cluster  solution  using  the  K-means  procedure  with 
the  twelve  peer  groups  defined  on  the  basis  of  the  centroid  method  provides  a  means 
for  checking  the  peer  group  solution.  If  two  different  methods  reveal  similar 
clusters  of  transit  agencies,  confidence  in  the  groups  is  increased.  We  can  be  more 
confident  that  the  results  are  not  due  to  a  peculiarity  of  the  particular  analytic 
technique.  Comparison  of  the  peer  groups  from  the  centroid  clustering  method  with 
those  from  the  K-means  technique  provides  such  a  test.  Comparing  the  twelve  peer 
groups  produced  by  these  two  methods  reveals  that  for  80%  of  the  cases  peer  group 
assignment  is  the  same  based  on  the  two  different  methods.  Four  of  the  twelve  peer 
groups  from  the  centroid  clustering  solution  were  kept  entirely  intact  in  the 
K-means  analysis,  though  in  three  groups  from  the  centroid  analysis  other  agencies 
were  added  in  the  K-means  solution.  In  seven  other  peer  groups  from  the  centroid 
analysis  more  than  half  of  the  cases  remained  together,  and  one  small  peer  group 
with  8  cases  was  divided  among  four  K-means  groups.  This  indicates  a  high 
correspondence  between  the  two  methods,  and  adds  support  to  the  centroid 
clustering  solution. 
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COMPARISON  OF  1979  AND  1980  CLUSTERS 

There  are  substantive  differences  between  the  cluster  structures  discovered  in 
the  1979  and  1980  data.  In  1980  there  are  twelve  clusters,  whereas  there  were  only 
eight  in  1979.  In  part  this  is  a  consequence  of  the  fact  that  fewer  cases  entered  into 
the  1979  data  analysis  because  of  nnissing  data  problems.  In  addition  the  1980  data 
analysis  was  based  on  more  valid  and  reliable  data.  Despite  these  differences,  there 
is  a  basic  underlying  similarity  between  the  two  analyses. 

One  hundred  and  eighty  seven  transit  systems  entered  into  the  analyses  in  both 
years.  Five  of  the  1979  cluster  groups  are  essentially  the  same  in  1980.  The  three 
peer  groups  that  changed  between  the  two  years  were  those  of  smaller  systems. 
Many  more  of  these  smaller  systems  entered  into  the  analysis  in  1980  and  thus  the 
peer  group  structure  is  finer  grained  for  this  size  range  in  1980.  Many  of  the  new 
groups  in  1980  are  composed  of  these  smaller  systems.  Overall,  the  1980  analysis 
can  be  considered  more  accurate  and  detailed  for  the  smaller  systems. 

Even  the  peer  groups  that  are  essentially  the  same  had  minor  differences.  Each 
transit  system  that  did  not  stay  with  its  peer  group  between  the  two  years  was 
examined  to  see  if  it  had  changed  in  any  way.  About  half  of  these  had  substantial 
differences  in  their  basic  operating  characteristics  and  thus  a  change  in  peer  group 
is  an  accurate  reflection  of  a  change  in  the  transit  system.  These  kinds  of  changes 
are  expected.  The  other  systems  which  changed  peer  groups  were  at  the  boundaries 
of  their  peer  group,  i.e.  they  were  somewhat  extreme  in  some  characteristic 
relative  to  their  peer  group.  These  changes  are  also  minor  and  logical  given  the 
increased  detail  of  the  1980  analysis  and  the  more  careful  determination  of  borders 
in  that  analyis. 

VALIDATION  OF  PEER  GROUPS 

This  section  examines  whether  agencies  in  the  twelve  peer  groups  formed  using 
the  centroid  method  of  clustering  in  fact  differ  in  their  operating  characteristics,  or 
whether,  on  the  other  hand,  the  groupings  of  systems  fail  to  capture  differences  in 
the  operating  characteristics  of  their  members. 


^Shirley  C.  Anderson  and  Gordon  J.  Fielding,  Comparative  analysis  of  transit 
performance.  Final  Report  No.  UMTA-CA-1 1-0020-1.  (Irvine,  Calif.:  University 
of  California,  Institute  of  Transportation  Studies,  January,  1982.)  (NTIS  No.  PB82- 
196^178). 
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Several  analyses  were  done  in  order  to  demonstrate  the  validity  and  robustness 
of  the  twelve  peer  groups.  The  analyses  reported  in  this  chapter  focus  on  the 
internal  validity  of  the  peer  groups;  that  is,  whether  they  reflect  differences  on  the 
original  variables  which  were  used  to  form  them.  Chapter  A  reports  on  the 
predictive  capabilities  of  the  peer  groups;  that  is,  their  relationship  to  other  factors, 
such  as  performance,  which  were  not  used  in  their  formation. 

Description  of  Peer  Groups 

One  of  the  most  straightforward  demonstrations  that  the  peer  groups  differ  in 
their  operating  characteristics  is  to  examine  the  average  characteristics  of  the 
transit  agencies  in  each  peer  group.  Statistics  describing  the  total  vehicle  miles, 
number  of  peak  vehicles,  speed  and  peak-to-base  ratios  of  the  peer  groups  are 
presented  in  Table  5-1.  Inspection  of  these  values  indicates  that  although  there  is 
variation  within  each  peer  group,  the  peer  groups  do  in  fact  differ  from  each  other 
in  their  operating  characteristics.  To  the  extent  that  we  can  describe  the 
differences  in  these  groups,  and  make  predictions  about  peer  group  membership 
based  on  operating  characteristics,  our  confidence  in  the  validity  of  the  groups  is 
increased. 

The  two  private  bus  companies  in  Peer  Group  1  stand  out  because  of  their 
extremely  high  average  speed.  They  are  the  smallest  in  size  and  the  lowest  in  peak 
to  base  ratios  relative  to  other  peer  groups. 

Peer  Group  2  consists  of  transit  providers  primarily  located  in  small  urban  areas 
or  suburban  areas  across  the  United  States  with  populations  under  500,000.  They  are 
small  (1  to  46  peak  vehicles),  fast  (17  to  22  miles  per  hour)  and  have  average  peak  to 
base  ratios. 

Although  Peer  Group  5  is  a  cross-national  group.  Southwestern  systems  are 
disproportionately  represented.  While  a  few  systems  are  in  the  suburban  fringes  of 
major  urban  areas,  most  are  in  small  cities  or  towns.  These  systems  are  small  (2  to 
74  peak  vehicles)  with  low  peak  to  base  ratios  (1.0  to  1.15)  and  above  average  speeds. 

Peer  Group  4  draws  from  all  parts  of  the  country  despite  its  small  size.  These 
systems  serve  small  cities  with  suburban  characteristics.  Systems  in  Peer  Group  4 
have  a  high  average  speed  (15.9  to  16.8  miles  per  revenue  vehicle  hour).  They  tend 
to  be  small  (fewer  than  50  peak  vehicles)  with  low  peak  to  base  ratios.  Their  speed 
is  consistant  with  their  suburban  locations. 
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TABLE  5-1.  DESCRIPTIVE  STATISTICS  FOR  PEER  GROUPS 


Peer  Group 
(n) 


Peak  Vehicle  Miles 

Vehicles  (10,000)  Speed 


Peak  to  Base 
Ratio 


mean 
std.  dev. 
minimum 
maximum 


13 
17 
1 

25 


193.8 
269.7 
3.0 
584.5 


27.88 
2.99 
25.77 
30.00 


1.02 
0.03 
1.00 
l.OA 


mean 
std.  dev. 
minimum 
maximum 


lA 
12 
1 


86.7 
98.8 
6.6 
A05.6 


19.55 
1.42 
17.21 
21.3A 


1.24 
0.39 
1.00 
2.30 


5  mean 
std.  dev. 

(44)  minimun 
maximum 


20 
18 
2 
74 


101.6 
95.4 
4.4 
435.1 


14.51 
0.65 
13.54 
15.65 


1.10 
0.16 
0.80 
1.50 


(7) 


mean 
std.  dev. 
minimum 
maximum 


22 
15 
10 
47 


108.0 
89.8 
39.2 

257.6 


16.23 
0.38 
15.88 
16.76 


1.10 
0.05 
1.00 
1.15 


5  mean 
std.  dev. 

(15)  minimum 
maximum 


26 
30 
1 

107 


83.7 
93.9 
1.4 
518.5 


8.91 
0.91 
7.50 
10.86 


1.52 
0.59 
0.57 
2.10 


6  mean 
std.  dev. 

(45)  minimum 
maximum 


28 
56 
2 

192 


126.7 
168.0 
7.5 
850.8 


12.19 
0.65 
10.79 
15.49 


1.11 
0.12 
1.00 
1.59 


7  mean 
std.  dev. 

(78)  minimum 
maximum 


57 
50 
4 

225 


205. 
180. 

14. 
817. 


12.80 
1.50 
9.65 

16.26 


1.85 
0.27 
1.57 
2.47 


8  mean 
std.  dev. 

(55)  minimum 
maximum 


158 
104 
5 

387 


453.7 
566.0 
4.7 
1549.4 


12.69 
2.05 
8.55 

18.14 


2.88 
0.52 
2.51 
5.61 
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TABLE  3-1.  DESCRIPTIVE  STATISTICS  FOR  PEER  GROUPS  (con.) 


Peer  Group  Peak  Vehicle  Miles  Peak  to  Base 

(n)  Vehicles  (10,000)  Speed  Ratio 


(8) 


mean 
std.  dev. 
nnininnum 
nnaximum 


250 
72 
96 

329 


1259.3 
316.2 
769.0 

1635.A 


15.72 
1.03 
14.56 
17.32 


l.AO 
0.28 
1.11 
1.86 


10 


(8) 


mean 
std.  dev. 
minimum 
maximum 


393 
94 
260 
506 


1723.0 
*  451.1 
1058.6 
2385.3 


11.10 
1.78 
8.18 

13.65 


1.76 
0.33 
1.10 
2.07 


11 

(13) 


mean 
std.  dev. 
minimum 
maximum 


889 
251 
666 
1573 


3465.7 
1055.0 
2405.8 
5688.0 


13.53 
2.12 
10.17 
18.40 


2.48 
0.42 
1.66 
3.14 


12 


(3) 


mean 
std.  dev. 
minimum 
maximum 


2477 
789 
1914 
3378 


9850.2 
1331.6 
3843.4 
10868.7 


10.58 
3.62 
6.45 

13.23 


1.74 
0.22 
1.60 
2.00 


Total 


mean 
std.  dev. 
minimum 
maximum 
number 


125 
316 
1 

3378 
297 


519.9 
1270.3 
1.4 
10868.7 
279 


13.40 
2.89 
4.81 

30.00 
277 


1.68 
0.94 
0.57 
13.00 
297 
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Peer  Group  5  is  unusual  in  that  nearly  half  of  its  members  are  private  bus 
companies  in  the  urban  New  York  City  area,  while  most  of  the  rest  are  small 
mid-western  city  agencies.  The  systems  in  this  group  are  distinguished  by  their  very 
low  speeds.  They  are  slightly  below  average  in  size,  and  average  in  peak  to  base 
ratios. 

Peer  Group  6  draws  systems  from  most  regions  of  the  United  States  but  with  a 
particular  emphasis  on  the  Midwest  and  South  central  regions.  While  a  few  medium 
sized  cities  are  included  in  this  group,  many  of  the  systems  serve  small  towns  or 
somewhat  rural  areas;  three-quarters  of  these  systems  are  in  areas  with  populations 
under  250,000.  Systems  in  this  peer  group  range  in  size,  but  are  generally  below 
average  in  number  of  peak  vehicles.  They  have  low  peak  to  base  ratios. 

Members  of  the  largest  peer  group.  Peer  Group  7,  are  found  in  all  parts  of  the 
United  States.  They  primarily  serve  small  cities  and  large  towns  (77,000  to 
500,000),  although  a  number  are  in  towns  in  metropolitan  New  York.  Systems  in  this 
peer  group  are  average  in  size  and  speed,  but  above  average  in  peak  to  base  ratios. 

Peer  Group  8  has  primarily  Midwestern  and  Eastern  small  to  medium-sized 
cities,  although  a  few  of  its  members  are  from  the  outer  suburban  sections  of  New 
York  and  Chicago.  It  differs  from  other  peer  groups  in  its  high  average  peak  to  base 
ratio  (all  above  2.3).  Systems  in  thisi  peer  group  range  widely  in  speed  and  size, 
though  there  are  no  systems  over  400  peak  vehicles  in  this  group. 

Systems  in  Peer  Group  9  are  all  from  the  Southwestern  areas  of  the  United 
States.  They  predominate  in  suburban,  low  density  areas  with  populations  between 
.5  and  1.5  million.  Systems  in  this  peer  group  are  above  average  in  size  and  speed, 
and  about  average  in  their  peak  to  base  ratios. 

Transit  systems  in  Peer  Group  10  are  all  public  agencies  in  large  urban  areas  (1 
to  3  million),  in  most  areas  of  the  United  States  except  the  Northeast.  These 
systems  have  an  above  average  number  of  peak  vehicles  (260  to  506)  and  usually 
below  average  speeds,  with  a  wide  range  of  peak  to  base  ratios.  Peer  Group  10  is 
similar  to  Peer  Group  1 1 ,  though  the  systems  are  smaller  on  average  and  have 
slightly  lower  peak  to  base  ratios. 

Peer  Group  11  includes  public  transit  agencies  in  major  urban  areas  (1.4  to  16 
million)  in  all  regions  of  the  United  States.  They  have  a  high  number  of  peak 
vehicles  (666  to  1573)  and  are  second  in  size  only  to  Peer  Group  12.  These  systems 
are  above  average  in  peak  to  base  ratio  and  are  average  in  speed. 

The  transit  agencies  in  Peer  Group  12  are  the  major  public  transit  providers  in 
the  three  largest  urban  areas  of  the  United  States.  All  three  have  over  1900  peak 
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vehicles.  They  are  one  of  the  two  slowest  groups  of  systems,  and  they  have  slightly 
above  average  peak  to  base  ratios. 

Relationship  between  Operating  Characteristics  and  Peer  Groups 

The  description  of  individual  peer  groups  illustrates  the  relationship  between 
peer  groups  and  operating  characteristics.  However,  summary  measures  of  the 
strength  of  the  overall  relationships  between  peer  group  membership  and  each  of  the 
operating  characteristics  is  useful.  The  eta  coefficient  provides  a  summary  of  the 
degree  of  association  between  a  number  of  groups  (such  as  the  peer  groups)  and 
another  variable  (for  example,  an  operating  measure).  The  eta  coefficient  squared 
is  interpreted  as  the  proportion  of  variance  in  an  operating  characteristic  which  can 
be  accounted  for  by  peer  group  membership.  Table  3-2  presents  four  eta 
coefficients,  each  describing  the  relationship  between  one  of  the  four  operating 
characteristics  and  the  twelve  peer  groups.  These  results  show  that  the  peer  groups 
capture  a  large  portion  of  the  variability  among  the  agencies  on  all  four  of  the 
operating  variables.  However,  the  groups  seem  to  be  most  strongly  related  to 
differences  in  the  size  of  the  systems. 


TABLE  3-2.  RELATIONSHIP  BETWEEN 
OPERATING  CHARACTERISTICS  AND  PEER  GROUPS 

2 

Operating  Characteristic  Eta  Eta 

Total  Vehicle  Miles  .968  .938 

Number  of  Peak  Vehicles  .952  .907 
Speed  .874  .764 

Peak  to  Base  Ratio  .915  .837 


PREDICTING  PEER  GROUP  MEMBERSHIP 

Another  approach  to  looking  at  the  relationship  of  operating  characteristics  to 
peer  group  membership  is  to  ask  whether  a  system's  peer  group  membership  can  be 
predicted  from  its  operating  characteristics.  Two  methods,  discriminant  analysis 
and  construction  of  a  decision  typology,  were  used  to  predict  peer  group  membership 
from  operating  characteristics. 


^N.H.  Nie,  C.H.  Hull.  J.G.  Jenkins,  K.  Steinbrenner  and  D.H.  Bent.  SPSS: 
Statistical  Package  for  the  Social  Sciences.  (New  York:  McGraw  Hill,  1975.) 
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Discriminant  Analysis 

Discriminant  analysis  is  a  statistical  technique  for  combining  information  on  a 
number  of  variables  to  make  a  prediction  about  the  group  membership  of  a  case  or  a 
number  of  cases.  The  logic  of  this  technique  is  the  reverse  of  cluster  analysis. 
Whereas  cluster  analysis  attempts  to  construct  groups  of  objects  from  their 
characteristics  on  a  number  of  variables,  discriminant  analysis  takes  the  groups  as 
given,  and  attempts  to  find  the  best  combination  of  the  variables  to  predict 
membership  in  these  groups.  It  may  be  used  either  to  test  the  assignment  of  cases  to 
groups,  or  to  make  group  assignments  for  new  cases  with  unknown  group  member- 
ship. Discriminant  analysis  here  is  used  in  a  descriptive  manner  since  use  of  cluster 
analysis  to  form  the  peer  groups  on  operating  characteristics  makes  subsequent 
statistical  tests  of  the  differences  among  groups  on  operating  variables  illegitimate. 

Four  operating  variables  (total  vehicle  miles,  number  of  peak  vehicles,  speed 
and  peak-to-base  ratio)  were  used  to  predict  the  most  likely  peer  group  assignment 
for  each  transit  agency.  Of  the  271  cases  in  the  analysis,  the  group  membership  for 
246  (91%)  was  predicted  correctly.  The  discriminant  analysis  also  reported  a  second 
most  likely  peer  group  assignment  for  each  agency.  Of  the  25  cases  whose  group 
membership  was  incorrectly  predicted  on  the  first  pass,  19  were  correctly  predicted 
on  the  second  pass.  This  is  a  rate  of  98%  correct  on  either  the  first  or  the  second 
prediction. 

Results  of  this  analysis  indicate  that  the  membership  in  peer  groups  can  be 
predicted  quite  accurately  from  information  on  operating  characteristics.  The  fact 
that  discriminant  analysis  uses  a  different  mathematical  model  to  combine 
information  on  operating  characteristics  than  does  cluster  analysis,  lends  additional 
confidence  to  the  conclusion  that  peer  groups  do  capture  differences  among  transit 
agencies  on  operating  characteristics. 

Typology 

Another  way  to  demonstrate  the  validity  of  the  differences  among  the  peer 
groups  on  the  operating  characteristics  is  to  construct  a  set  of  decision  rules  for 
assigning  agencies  to  peer  groups  based  on  their  operating  characteristics.  This  also 
has  great  practical  significance  since  although  cluster  analysis  constructs  a  set  of 
groups  from  data  on  characteristics  of  the  agencies,  it  does  not  handle  the  problem 
of  the  assignment  of  new  cases. 

Figure  3-1  presents  a  decision  tree  which  makes  a  prediction  of  peer  group 
membership  for  each  transit  system  based  on  its  number  of  peak  vehicles,  peak  to 
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base  ratio  and  speed.  Since  total  vehicle  miles  and  number  of  peak  vehicles  are 
highly  correlated  for  the  sample  of  cases,  only  the  number  of  peak  vehicles  was 
necessary  in  the  decision  tree  to  distinguish  among  the  peer  groups. 

By  starting  at  the  top  of  the  decision  tree,  and  following  the  path  corresponding 
to  the  operating  characteristics  of  a  system,  a  transit  agency  can  be  assigned  to  its 
appropriate  peer  group.  A  test  of  this  typology  on  the  FY  1980  data  correctly 
predicted  peer  group  membership  of  97%  of  the  cases.  This  typology  could  also  be 
used  to  predict  the  peer  group  membership  for  agencies  which  did  not  report  data  in 
FY  1980,  or  to  construct  peer  groups  from  data  reported  for  other  years. 

The  success  of  this  method  of  predicting  peer  group  membership  on  the  basis  of 
operating  characteristics  lends  further  support  to  the  validity  of  the  peer  groups  in 
terms  of  capturing  differences  among  agencies  on  operating  characteristics. 

CLUSTER  ANALYSIS  ON  PERFORMANCE  VARIABLES 

An  analysis  was  done  to  develop  peer  groups  on  the  basis  of  performance  for 
several  reasons.  Since  peer  groups  based  on  operating  characteristics  capture 
significant  differences  on  performance,  it  is  of  some  interest  to  determine  if  the 
inverse  is  true — those  cases  most  similar  in  performance  will  also  be  similar  in  their 
operating  characteristics.  Peer  groups  based  on  performance  could  also  be  used  as  a 
research  tool  for  exploring  possible  causes  of  higher  performance.  Although  a  set  of 
four  operating  variables  were  used  to  make  peer  groups,  there  may  be  many  other 
features  of  a  transit  system — such  as  its  management  form,  allocation  of  expenses 
to  various  functions  or  geographical  location — which  contribute  to  performance. 
Performance  peer  groups  could  be  used  to  generate  hypotheses  about  which  other 
facets  of  operations  lead  to  specific  patterns  of  performance. 

The  first  set  of  marker  variables  identified  in  the  previous  chapter  were  used  as 
the  measures  of  performance.  One  cluster  solution  was  based  on  all  seven  markers, 
another  on  the  first  three.  The  first  three  were  used  for  an  alternate  solution 
because  they  are  the  most  important  in  terms  of  the  variance  in  performance  they 
capture.  They  also  represent  the  three  major  aspects  of  the  performance  model. 

The  same  methods  were  used  for  clustering  the  cases  on  performance  as  were 
used  for  the  operating  variables.  All  values  were  standardized  prior  to  the  analysis. 
Euclidean  distance  was  used  as  the  measure  of  dissimilarity  and  the  centroid  method 
of  clustering  was  used. 
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The  solution  found  with  seven  performance  variables  provided  neither  a 
connplete  nor  useful  set  of  peer  groups.  Only  205  out  of  30A  cases  had  enough  data 
to  enter  the  analysis.  Of  these  only  140  (68%  of  205)  entered  into  a  distinct  peer 
group.  The  others  were  left  as  outliers  or  grouped  into  tiny  clusters  of  2  or  3  cases; 
too  small  for  practical  application.  Thus,  overall,  only  A6%  of  the  304  transit 
systems  became  members  of  useful  peer  groups.  It  was  concluded  that  with  so  many 
independent  measures  of  performance,  there  simply  were  not  coherent  patterns  of 
performance  across  all  seven  variables.  Many  cases  were  statistically  too  different 
to  cluster  with  other  cases. 

The  peer  group  solution  with  the  first  three  performance  indicators — 
RVH/OEXP,  TPAS/RVH  and  OREV/OEXP— gave  a  more  useful  division.  Two 
hundred  thirty  transit  agencies  had  enough  data  to  enter  the  analysis  and  all  but 
seven  of  these  became  members  of  a  peer  group.  Thus  76%  of  the  304  cases  were 
clustered  on  the  basis  of  performance.  Seven  performance  peer  groups  were  formed 
with  between  1 3  and  66  members  each. 

Table  3-3  shows  how  much  association  there  is  between  peer  group  membership 

based  on  performance  and  the  seven  performance  marker  variables.  The  proportion 

of  variance  accounted  for  by  performance  peer  group  membership  is  given  by  the 
2 

eta   values.  As  would  be  expected,  the  three  performance  indicators  that  were  used 

to  form  these  peer  groups  have  much  of  their  variability  accounted  for,  as  shown  by 
2 

eta    coefficients  of  .622  to  .689.  However,  comparison  to  Table  5-2  reveals  that 

discrimination  on  performance  is  much  less  than  that  on  operating  characteristics. 
2 

The  lowest  eta  on  an  operating  characteristic  is  .764.  Thus  peer  groups  based  on 
performance  are  not  as  distinct  from  each  other  as  are  those  based  on  operating 
characteristics. 

The  peer  groups  based  upon  three  performance  indicators  also  discriminate  on 

the  four  other  marker  variables,  although  to  a  much  lower  degree.   For  vehicle 

efficiency,  less  than  4%  of  the  variation  can  be  accounted  for  by  performance  peer 

group  membership.  This  suggests  that  the  performance  groups  do  not  capture  much 

of  the  differences  on  vehicle  efficiency.  Labor  efficiency  is  well  accounted  for  with 
2 

an  eta  of  .202.  Maintenance  efficiency  and  safety  fall  in  between  labor  and  vehicle 
efficiency. 

Table  3-4  relates  the  performance  peer  groups  to  the  operating 
characteristics.  Size,  as  measured  by  the  number  of  peak  vehicles,  is  well 
differentiated  between 
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TABLE  3-3.  RELATIONSHIP  BETWEEN  PERFORMANCE  CHARACTERISTICS  AND 

PERFORMANCE  PEER  GROUPS 


Performance  Characteristic 

Eta 

Eta^ 

Cost  Efficiency  (RVH/OEXP) 

.788 

.622 

Service  Utilization  (TPAS/RVH) 

.807 

.651 

Revenue  Generation  (OREV/OEXP) 

.850 

.689 

Labor  Efficiency  (TVH/EMP) 

.450 

.202 

Vehicle  Efficiency  (TVM/PVEH) 

.191 

.037 

Maintenance  Efficiency  (TVM/MNT) 

.281 

.079 

Safety  (TVM/ACC) 

.32^1 

.105 

TABLE  3-4.  RELATIONSHIP  BETWEEN  OPERATING  CHARACTERISTICS  AND 

PERFORMANCE  PEER  GROUPS 

2 

Performance  Characteristic  Eta  Eta 

Peak  Vehicles  .525  .276 

Peak  to  Base  Ratio  .404  .163 

Speed  .222  .050 
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performance  peer  groups.  Peak  to  base  ratio  has  a  lesser  but  still  important  amount 
of  its  variance  accounted  for.  Speed  is  poorly  differentiated  between  the 
performance  peer  groups. 

The  descriptive  statistics  for  each  performance  peer  group  on  each  marker 
variable  and  operating  variable  are  given  in  Appendix  F. 

The  peer  groups  based  on  three  performance  indicators  are  quite  similar  to 
those  produced  by  the  cluster  analysis  based  on  seven  performance  variables.  Some 
of  the  peer  groups  in  the  seven  variable  analysis  are  all  members  of  the  same  peer 
group  in  the  three  variable  analysis.  Some  others  are  divided  between  two  of  the 
peer  groups  in  the  three  variable  analysis.  Overall,  the  three  variable  peer  group 
analysis  is  better  because  it  encompasses  more  cases,  forms  a  more  distinctive 
pattern  of  clustering  and  maintains  much  of  the  structure  found  in  the  seven 
variable  solution. 

However,  the  solution  found  for  the  cluster  analyses  on  operating  variables  and 
the  one  on  performance  variables  have  little  in  common.  No  more  than  43%  of  any 
of  the  seven  peer  groups  found  with  performance  indicators  fell  into  the  same  peer 
group  based  on  operating  characteristics.  In  fact,  on  the  average,  only  13.2%  of  the 
cases  from  the  performance  clusters  moved  together  into  the  same  cluster  based  on 
operating  variables.  Or  put  another  way,  each  peer  group  based  on  performance  has 
members  from  about  seven  of  the  peer  groups  based  on  operating  characteristics. 

Although  the  peer  groups  based  on  performance  capture  some  of  the  differences 
in  operating  characteristics,  each  of  these  peer  groups  covers  a  larger  range  of  size 
and  peak  to  base  ratios  than  any  of  the  peer  groups  based  on  operating  character- 
istics. Thus  they  are  useful  in  demonstrating  that  managers  are  able  to  affect 
patterns  of  efficiency  and  effectiveness  despite  constraints  determined  by  operating 
characteristics.  Good  performance  is  not  just  the  province  of  transit  systems  with 
the  optimal  size  or  peak  to  base  ratio. 

USES  OF  PEER  GROUPS  BASED  ON  OPERATING  AND  PERFORMANCE 
VARIABLES 

Viable  peer  groups  were  found  through  analysis  on  both  operating 
characteristics  and  performance  indicators.  Since  these  groupings  are  quite 
different  from  each  other,  it  is  necessary  to  consider  which  is  better  for  managers 
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to  use  when  evaluating  their  transit  system.  Most  other  authors  '  '  who  have 
worked  on  developing  methods  for  evaluating  transit  performance  have  limited  their 
analyses  to  transit  systems  which  are  similar  in  size,  type  of  service  area  and 
characteristics  of  the  population  to  be  served  (e.g.  %  of  elderly,  wage  levels).  The 
principle  behind  these  approaches  is  that  transit  managers  are  forced  to  work  within 
a  given  set  of  parameters. 

It  is  unfair  and  uninformative  to  compare  transit  systems  which  function  under 
totally  different  sets  of  circumstances  which  are  not  directly  under  the  transit 
system's  control.  This  principle  has  guided  this  research  as  well.  But,  unlike  many 
of  the  previous  efforts,  the  object  of  this  research  was  to  develop  a  method  of 
comparative  evaluation  which  is  nation-wide  in  scope.  Many  states  do  not  have 
enough  transit  systems  to  make  adequate  norms  for  comparison  and  performance 
audits  select  systems  in  ways  that  can  be  misleading.  Even  those  states  with  many 
transit  systems,  such  as  New  York  and  Michigan,  have  systems  which  operate  under 
quite  different  circumstances,  such  as  the  differences  between  the  New  York 
metropolitan  area  and  upstate  New  York. 

The  peer  groups  defined  in  this  research  are  based  upon  operating  character- 
istics which  are  measured  by  information  obtainable  from  Section  15  data. 
Comparisons  wthin  these  groups  are  more  valid  than  comparing  systems  with  similar 
performance  because  they  are  based  upon  inherent  characteristics  of  transit 
operations  reflecting  the  demands  of  the  service  area  and  management's  response. 
Further,  it  was  found  that  more  complete  information  was  available  for  basic 
operating  variables  than  for  performance  indicators.  Only  two  hundred  and  thirty 
cases  could  be  clustered  on  the  basis  of  performance,  while  274  could  be  clustered 
on  the  basis  of  operating  characteristics.  Operating  characteristics  are  also  more 


^Dennis  F.  McCrossen,  Choosing  performance  indicators  for  small  transit 
systems.  Transportation  Engineering,  48(3),  March  1978,  26-50. 

"^Kumares  C.  Sinha,  David  P.  Jukins  and  Oreste  M.  Bevilacqua,  Stratification 
approach  to  evaluation  of  urban  transit  performance.  Transportation  Research 
Record  No.  761.  (Washington,  D.C.:  Transportation  Research  Board,  1980),  pp. 
20-27. 

^James  M.  Holec,  Dianne  5.  Schwager  and  Angel  Fandialan,  Use  of  federal 
Section  15  data  in  transit  performance:  Michigan  program.  Transportation 
Research  Record  No.  746.  (Washington,  D.C.:  Transportation  Research  Board, 
1980).  pp.  36-38. 
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likely  to  be  stable  from  year  to  year,  thus  allowing  the  same  basic  set  of  peer  groups 
to  be  used  repeatedly. 

Peer  groups  based  on  operating  characteristics  provide  more  discriminatory 
power  across  the  entire  set  of  performance  indicators.  The  analysis  based  on 
performance  indicators  found  that  not  all  performance  indicators  varied 
significantly  between  peer  groups,  and  only  the  first  three  performance  indicators 
were  clearly  different  for  each  peer  group.  Also,  when  clustering  is  based  upon 
performance,  there  is  not  much  to  be  learned  by  comparing  a  transit  system  to  those 
with  similar  performance;  they  will  already  be  too  similar  to  make  small  distinctions 
meaningful.  Also,  a  case  at  the  lower  end  of  performance  in  its  performance  peer 
group,  would  be  similar  to  the  best  performer  in  an  adjacent  peer  group.  It  doesn't 
encourage  improvement  when  management  realizes  that  they  are  the  worst 
performer  in  the  set  of  best  performing  transit  systems.  They  just  argue  over 
misplacement. 

Therefore,  use  of  peer  groups  based  upon  operating  characteristics  is 
advocated.  Comparisons  of  performance  are  more  indicative  of  how  well 
management  is  performing:  more  systems  can  be  included  in  such  an  analysis;  the 
peer  groups  are  more  stable,  and  such  comparison  has  more  discriminatory  power  for 
revealing  particularly  weak  or  strong  systems  operating  under  similar  circumstances. 
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CHAPTER  4 
THE  PERFORMANCE  OF  PEER  GROUPS 


INTRODUCTION 

The  previous  three  chapters  have  presented  the  technical  aspects  of  how  this 
project  identified  the  major  dimensions  of  bus  transit  performance  and  established 
peer  groups  of  bus  companies  that  share  similar  operating  characteristics.  The  main 
goal  of  this  chapter  is  to  demonstrate  how  the  combined  use  of  peer  groups  and  key 
performance  indicators  creates  a  powerful  tool  for  understanding  differences  in 
transit  performance.  The  first  section  describes  the  performance  of  each  peer 
group  on  each  performance  indicator  compared  to  national  norms.  The  second 
answers  the  question:  Does  overall  performance  on  each  indicator  vary  significantly 
between  peer  groups?  The  third  section  examines  the  issue  of  whether  the  operating 
characteristics  which  were  used  to  create  the  peer  groups  have  a  structured 
relationship  to  each  of  the  performance  indicators.  And  the  chapter  concludes  with 
a  discussion  of  further  uses  for  this  type  of  performance  evaluation. 

The  performance  indicators  and  peer  groups  were  identified  through  statistical 
means,  taking  into  account  the  reliability  of  different  parts  of  the  Section  15  data 
base,  patterns  of  correlations  between  variables  and  other  mathematical  properties 
inherent  to  the  data.  These  statistical  analyses  strongly  support  the  choices  of 
performance  indicators  and  the  structuring  of  peer  groups.  However,  a  statistically 
significant  analysis  does  not  guarantee  that  the  results  of  the  analysis  will  have 
important  implications  in  practical  contexts.  While  more  statistical  analysis  will  be 
presented  in  this  chapter  to  further  substantiate  the  validity  of  the  results,  emphasis 
will  be  placed  on  demonstrating  that  the  peer  groups  and  performance  indicators 
capture  major,  distinctive  patterns  of  performance.  Graphs  and  verbal  descriptions 
of  the  data  will  be  used  in  this  chapter  in  preference  to  statistical  tables,  so  that 
the  patterns  of  relationships  may  be  shown  without  recourse  to  statistics  or 
technical  interpretations. 

Another  major  emphasis  will  be  placed  on  showing  how  the  inherent  operating 
chracteristics  of  transit  systems  relate  to  performance.  Operating  characteristics 
of  a  transit  system  such  as  size,  speed  and  peak  to  base  ratio  have  a  structured, 
although  complex  relationship  to  transit  performance,  in  both  direct  and  indirect 
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ways.   Oram  has  described  how  labor  costs  are  greatly  increased  by  high  peak  to 

1  2 
base  ratios.    Speed  is  also  significantly  related  to  cost  per  vehicle  mile. 

Operating  characteristics  also  act  indirectly  on  transit  performance  by  serving 

as  proxies  for  environmental  and  demographic  characteristics  of  the  service  area.  A 

number  of  studies  have  found  significant  relationships  between  service  area 

characteristics  and  transit  performance.  Giuliano  found  that  for  California  transit 

systems,  the  size  of  the  service  area,  population  density  and  the  size  of  the  urban 

area  are  related  to  cost  efficiency  and  labor  efficiency.    In  cross-sectional  national 
4  5 

studies,  Nelson  and  Miller  found,  respectively,  that  demographic  variables  (low 
income  households,  relatively  young  or  old  populations,  percentage  of  auto-less 
families)  and  environmental  factors  (city  age,  city  size)  significantly  influence  bus 
transit  costs.  In  their  discussions  of  their  results,  these  authors  link  the 
environmental  and  demographic  variables  to  the  operating  characteristics  of  transit 
systems  used  in  this  study. 

Speed  is  related  to  the  population  density  of  a  service  area,  the  traffic 
congestion  on  major  roads  and  the  kinds  of  routes  operated  by  a  transit  system  (e.g., 
express  vs  local).  Peak  to  base  ratios  indicate  whether  a  transit  system  is  oriented 
to  work  bound  commuters  (high  peak  to  base  ratio)  or  to  transit  dependent 
populations  such  as  the  elderly,  the  low  income  and  students.  To  some  degree  the 
peak  to  base  ratio  will  also  reflect  environmental  factors  such  as  the  lack  of  parking 
and  insufficient  highway  capacity  which  influence  service  utilization  in  older 
Eastern  cities.  The  size  of  a  transit  system  (measured  by  both  the  number  of  peak 


^Richard  L.  Oram,  Peak  period  supplements:  The  contemporary  economics  of 
urban  bus  transport  in  the  U.K.  and  U.S.A.  Progress  in  Planning.  1979,  12(2).  Sl-\5^. 

2james  H.  Miller  and  John  C.  Rea,  comparison  of  cost  models  for  urban 
transit.  Highway  Research  Record  No.  435  (Washington,  D.C.:  Transportation 
Research  Board,  1973),  pp.  11-19. 

'Genevieve  Giuliano,  The  effect  of  environmental  factors  on  the  efficiency  of 
public  transit  service.  Transportation  Research  Record  No.  797  (Washington,  D.C.: 
Transportation  Research  Board,  1981),  pp.  11-16. 

^Gary  R.  Nelson,  An  econometric  model  of  urban  bus  transit  operations. 
(Unpublished  Ph.D.  Dissertation.  Rice  University,  1972.)  Available  from  University 
Microfilms  International  as  No.  72-26A57. 

^David  R.  Miller.  Differences  among  cities,  differences  among  firms,  and  costs 
of  urban  bus  transport.  Journal  of  Industrial  Economics.  1970,  19(1).  22-32. 
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vehicles  and  total  vehicle  miles)  is  related  to  other  constraints  on  operations: 
organized  labor  units  are  nnore  influential  in  larger  agencies,  efficient  route 
scheduling  is  nnore  difficult,  and  these  cause  diseconomies  of  scale  reducing  the 
advantages  gained  through  service  integration. 

Since  there  is  more  evidence  for  the  environmental  and  demographic  influences 
on  transit  performance  than  for  the  direct  influence  of  operating  characteristics,  it 
is  necessary  to  assess  the  actual  impact  of  the  four  selected  operating  character- 
istics on  the  performance  indicators  used  in  this  study.  Further,  all  of  the 
previously  cited  studies  used  relatively  small  samples  of  transit  systems  (less  than 
35  in  each  case)  which  do  not  purport  to  represent  the  national  transit  industry.  The 
data  reported  here  are  more  comprehensive  than  that  used  in  the  earlier  research. 
In  addition,  the  other  studies  did  not  assess  the  relationship  of  operating  and 
environmental  factors  to  a  set  of  performance  measures  that  represent  distinct 
major  dimensions  of  transit  performance.  Thus  the  relationship  of  operating 
characteristics  to  the  major  dimensions  of  performance  will  be  examined  in  detail. 

PERFORMANCE  PROFILES  OF  PEER  GROUPS 

Each  peer  group  can  be  characterized  by  its  relative  strengths  and  weaknesses 
across  the  seven  performance  indicators.  While  two  peer  groups  may  look  quite 
similar  on  any  given  performance  measure,  no  two  peer  groups  were  identical  across 
all  seven  of  them.  In  addition,  no  single  peer  group  can  be  credited  with  the  best 
overall  performance.  There  are  apparently  tradeoffs  between  measures.  For 
instance,  all  peer  groups  with  high  ridership  also  have  relatively  expensive  service. 

The  performance  profile  of  each  peer  group  was  created  by  comparing  its 
average  (mean)  score  on  each  performance  indicator  to  the  average  for  the  entire 
nation  as  represented  in  Section  15  data.  On  the  graphs  that  follow,  the  national 
average  is  indicated  by  a  zero  on  the  vertical  axis.  Scores  above  the  zero  indicate 
above  average  performance,  ranging  up  to  one  standard  deviation  above  the  national 
mean.  Scores  below  the  zero  indicate  below  average  performance,  ranging  down  to 
one  standard  deviation  below  the  national  mean.  Numerical  data  used  to  construct 
these  graphs  are  given  in  Appendix  G. 

Although  each  peer  group  is  compared  to  the  national  average  for  each 
performance  indicator,  this  is  a  descriptive  device  and  not  intended  to  act  as  an 
absolute  standard  of  performance.  Standards  need  to  be  developed  relative  to  each 
peer  group  because  each  is  operating  under  different  constraints.  In  this  sample  of 
transit  systems,  about  half  are  below  the  national  average  and  half  are  above  the 
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national  average  on  each  performance  indicator.  Although  no  peer  group  averages 
more  than  about  one  standard  deviation  above  or  below  the  national  mean, 
approximately  30%  of  the  individual  transit  systems  will  be  more  than  one  standard 
deviation  from  the  mean. 

For  ease  of  comprehension,  the  discussion  for  each  peer  group  refers  to  the 
general  concepts  measured  by  the  performance  indicators.  However,  the  graphs 
represent  performance  on  the  seven  marker  variables  identified  in  Chapter  2  and  are 
shown  in  Table  2-A.  Table  A-1  summarizes  which  marker  varibles  represent  which 
general  concept. 


TABLE  4-1.  THE  RELATION  OF  MARKER  VARIABLES 
TO  PERFORMANCE  CONCEPTS 


Marker  Variable 
RVH/OEXP 

TPAS/RVH 

OREV/OEXP 

TVH/EMP 
TVM/PVEH 
TVM/MNT 
TVM/ACC 


Concepts 

Cost  Efficiency 
Output  per  Dollar  Cost 

Service  Utilization 
Service  Effectiveness 

Revenue  Generation 
Cost  Effectiveness 

Labor  Efficiency 

Vehicle  Efficiency 

Maintenance  Efficiency 

Safety 


Peer  Group  I 

The  two  private  bus  companies  in  Peer  Group  1  stand  out  because  of  their  high 
average  speed  and  long  passenger  trips.  Although  both  have  exceptionally  high 
revenue  generation  performance,  they  are  below  the  national  average  on  the 
measures  of  cost  efficiency,  service  utilization  and  labor  efficiency.  The  high  score 
on  vehicle  efficiency  is  a  result  of  one  company's  unusually  high  mileage  per  peak 
vehicle.  However,  because  of  their  high  speed,  both  companies  generate  many 
vehicle  miles  and  do  well  in  both  safety  and  maintenance  efficiency. 
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FIGURE  4-1.  PEFFORMANCE  PROFILE  FOR  PEER  GROUP  1 


Standard  Devi at ions  From  Mean 


Performance  Indicators 

Zero  on  vertical  axis  equals  national  mean. 

H  =  2 

Although  there  were  only  two  agencies  represented  in  the  FY  1980  data  for  this 
peer  group,  it  is  anticipated  that  many  more  of  this  type  will  report  in  the  future. 
Since  FY  1982,  agencies  operating  under  conLrdcL  Lo  public  agenices  were 
encouraged  to  report,  because  a  metropolitan  area's  share  of  federal  transit 
assistance  is  determined,  in  part,  by  the  revenue  vehicle  service  miles  operated  by 
all  companies  in  the  region. 

Peer  Group  2 

The  16  systems  in  Peer  Group  2  are  small,  fast  bus  companies  in  small  urban  or 
suburban  areas  across  the  United  States.   They  excel  in  vehicle  efficiency,  safety 
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FIGURE  4-2.     PERFORMANCE  PROFILE  FOR  PEER  GROUP  2 


Standard  DQviations  from  Mean 


Pgrformonce  Indicators 

Zero  on  vertical  axi9  equals  notional  mean 
N  =  16 

and  maintenance  efficiency — having  high  nnileage  per  peak  vehicle,  per  accident  and 
per  nnaintenance  employee.  As  a  group,  these  systems  have  a  below  average 
performance  on  all  other  measures  although  there  is  great  variation  in  cost 
efficiency,  service  effectiveness  and  ratio  of  operating  revenue  to  operating 
expense. 

Peer  Group  3 

Peer  Group  3  is  a  cross-national  group  but  draws  disproportionately  from  the 
Southwest.  These  systems  are  small  and  about  average  in  both  speed  and  peak  to 
base  ratio.   Peer  Group  3's  performance  profile  is  quite  similar  to  Peer  Group  6's 


31 


FIGURE  4-3.     PERFORMANCE  PROFILE  FOR  PEER  GROUP  3 
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except  that  a  slightly  lower  level  of  cost  efficiency  is  traded  off  for  a  higher  level 
of  vehicle  efficiency,  reflecting  the  somewhat  higher  average  speeds  of  this  group. 
Although  Peer  Group  3  has  a  just  below  average  number  of  passengers  per  hour,  its 
revenue  generation  is  the  second  lowest  among  the  peer  groups.  This  possibly 
reflects  the  state  operating  assistance  available  to  many  of  these  systems  and  the 
local  desire  to  retain  low  fares. 

Peer  Group  A 

Although  Peer  Group  A  draws  from  all  parts  of  the  country,  it  contains  only 
seven  systems.  These  systems  are  small  and  serve  small  cities  with  suburban 
characteristics,  as  indicated  by  their  high  average  speed  (15.9  to  16.8  mph).  The 
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FIGURE  4-4.     PERFORMANCE  PROFILE  FOR  PEER  GROUP  4 
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group's  performance  profile  hovers  just  around  average  on  all  measures.  It  is  just 
above  average  on  service  effectiveness,  vehicle  efficiency  and  safety.  All  other 
measures  are  slightly  below  average. 

Peer  Group  5 

Peer  Group  5  is  unusual  in  that  nearly  half  of  its  fifteen  members  are  private 
bus  companies  in  the  New  York,  New  Jersey  metropolitan  area,  while  most  of  the 
rest  are  in  smaller  Midwestern  cities.  Peer  Group  5's  performance  profile  is  the 
inverse  of  Group  9 — with  well  above  average  performance  in  cost  efficiency, 
revenue  generation  and  employee  efficiency.  Below  average  performance  showings 
are  indicated  for  service  effectiveness  and  vehicle  efficiency  although  this  is  in 
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FIGURE  4-5.     PERFORMANCE  PROFILE  FOR  PEER  GROUP  5 
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large  part  a  result  of  long  passenger  trip  lengths  and  slow  speeds.  Because  this 
group  has  a  mixture  of  private  and  public  companies,  revenue  generation  is  a  very 
heterogeneous  variable,  and  this  peer  group  has  both  the  highest  and  lowest  levels  of 
subsidization  to  be  found  among  all  transit  systems. 

Peer  Group  6 

Peer  Group  6  draws  from  most  regions  of  the  United  States  but  with  a 
particular  emphasis  on  the  Midwest  and  deep  South  central  regions.  The  forty-five 
members  of  this  group  are  small  to  medium  systems  with  a  low  peak  to  base  ratio. 
Peer  Group  6  does  well  in  cost  efficiency,  labor  efficiency,  vehicle  efficiency  and 
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FIGURE  4-6.     PERFORMANCE  PROFILE  FOR  PEER  GROUP  6 
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maintenance  efficiency.  Their  below  average  service  effectiveness  as  a  group  is 
actually  a  function  of  a  few  systems  with  very  low  numbers  of  passengers  per 
service  hour.  They  are  average  in  safety  and  revenue  generation.  Overall,  this 
group  has  a  very  high  level  of  performance,  rivalled  only  by  Peer  Group  3. 

Peer  Group  7 

As  members  of  the  largest  peer  group,  the  seventy-eight  Peer  Group  7  systems 
are  found  in  all  parts  of  the  United  States.  They  range  in  size  from  small  to 
medium,  have  average  speed  and  slightly  above  average  peak  to  base  ratios. 
They  are  slightly  above  average  in  cost  efficiency,  labor  efficiency  and  cost- 
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FIGURE  4-7.     PERFOilMANCE  PROFILE  FOR  PEER  GROUP  7 
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effectiveness.  They  are  slightly  below  average  on  all  other  measures.  Although  the 
group  is  large,  the  systems  are  relatively  homogeneous  on  most  measures  although 
there  are  exceptions  and  outliers. 

Peer  Group  8 

Peer  Group  8  has  thirty-three  small  to  medium  transit  systems  which  share  in 
having  a  higher  than  average  peak  to  base  ratio.  They  primarily  serve  cities.  The 
major  strength  of  Peer  Group  8  is  that  it  has  a  slightly  above  average  proportion  of 
operating  revenue  relative  to  operating  expense,  i.e.,  they  recover  a  relatively  high 
level  of  their  expenses  from  the  fare  box.  However,  they  have  the  lowest  vehicle 
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FIGURE  4-8.     PERFORMANCE  PROFILE  FOR  PEER  GROUP  8 
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efficiency,  probably  because  of  their  strong  peak  orientation.  They  are  slightly 
below  average  on  the  other  measures  of  performance  with  little  variation  among 
them. 

Peer  Group  9 

Peer  Group  9  systems  are  all  from  the  Southwestern  region  of  the  United  States 
in  suburban  areas.  These  eight  systems  are  above  average  in  size  and  speed.  In  this 
peer  group  very  high  vehicle  efficiency  and  above  average  numbers  of  passengers 
are  weighed  against  quite  low  cost-efficiency,  revenue  generation,  and  labor 
efficiency.    Safety  and  vehicle  maintenance  efficiency  are  just  below  average. 
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FIGURE  4-9.     PERFORMANCE  PROFILE  FOR  PEER  GROUP  9 
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However,  the  low  average  on  revenue  generation  is  a  result  of  two  systems  with 
unusually  high  levels  of  local  assistance. 

Peer  Group  10 

Peer  Group  lO's  eight  transit  systems  are  above  average  in  size,  below  average 
in  speed,  and  they  serve  major  urban  areas  in  most  parts  of  the  United  States, 
except  the  Northeast.  Peer  Group  10  is  well  above  average  in  the  number  of 
passengers  per  hours  that  are  carried  by  systems  in  this  group.  It  is  also  slightly 
above  average  in  vehicle  efficiency.  Its  low  point  is  cost  efficiency,  with  revenue 
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FIGURE  4-10.     PERFORMANCE  PROFILE  FOR  PEER  GROUP  10 
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vehicle  hours  per  operating  expenses  being  v^/ell  below  average.  On  the  other 
measures  of  performance.  Peer  Group  10  is  just  slightly  below  the  mean  for  all 
transit  systems. 

Peer  Group  11 

Peer  Group  11  systems  are  primarily  very  large,  public  transit  agencies  in  major 
urban  areas.  These  thirteen  systems  are  above  average  on  peak  to  base  ratio.  Peer 
Group  ll's  major  strength  is  the  number  of  passengers  per  hour  carried.  Only  Peer 
Group  12  carries  more  passengers.  Peer  Group  11  also  generates  an  above  average 
amount  of  revenue.  However,  the  service  provided  by  the  systems  in  Peer  Group  11 
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FIGURE  4-11.     PERPORMANCE  PROFILE  FOR  PEER  GROUP  11 
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tends  to  be  more  expensive  than  average.  On  safety,  labor  efficiency,  vehicle 
efficiency  and  nnaintenance  efficiency.  Peer  Group  11  is  slightly  below  average  in 
performance. 


Peer  Group  12 

The  transit  agencies  in  Peer  Group  12  are  the  major  public  transit  providers  in 
the  three  largest  urban  areas  of  the  United  States.  All  three  have  well  above 
average  ridership  per  service  hour  and  a  resultingly  high  level  of  operating  revenue. 
However,  they  have  uniformly  high  expenses  and  low  labor  efficiency.  Vehicle 


FIGURE  A-12.     PERFORMANCE  PROFILE  FOR  PEER  GROUP  12 
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efficiency  is  average  because  two  systems  are  below  average  but  one  is  well  above 
average.  They  also  have  well  below  average  safety  and  maintenance  records  as  a 
group. 


COMPARISON  OF  PERFORMANCE  INDICATORS  ACROSS  PEER  GROUPS 

While  each  peer  group  has  its  own  distinctive  performance  profile,  this  does  not 
prove  that  each  performance  indicator  by  itself  captures  important  differences 
between  peer  groups.  Two  methods  were  used  to  explore  this  issue.  A  series  of 
statistical  tests  were  used  to  see  if  the  peer  groups  were  significantly  different  on 
each  performance  indicator.    Then,  each  performance  indicator  was  displayed 
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graphically  to  show  the  important,  structured  ways  that  the  performance  indicators 
vary  across  peer  groups. 

Peer  Group  Differences 

Analysis  of  variance  tests  whether  there  are  statistically  significant  differences 
between  the  means  of  a  set  of  groups.  It  does  this  by  measuring  whether  the 
variation  between  groups  is  greater  than  that  within  groups.  Thus,  if  the  groups  are 
more  different  from  each  other  than  are  the  individual  transit  systems  within  the 
groups,  there  is  a  significant  difference  between  the  groups  overall. 

A  one-way  analysis  of  variance  was  done  for  each  performance  indicator  using 
the  SPSS  program  Breakdown.  Table  A-2  shows  the  F  score,  eta  coefficient  and 
level  of  significance  for  each  performance  indicator. 

TABLE  4-2.  RELATIONSHIP  BETWEEN  PERFORMANCE 
VARIABLES  AND  PEER  GROUPS 


Performance  Measure 

F  score 

Significance  Level 

Eta 

Cost  Efficiency  (RVH/OEXP) 

9.839 

.0000 

.55 

Service  Utilization  (TPAS/RVH) 

6.319 

.0000 

.49 

Revenue  Generation  (OREV/OEXP) 

3.466 

.0002 

.36 

Labor  Efficiency  (TVH/EMP) 

5.533 

.0000 

.44 

Vehicle  Efficiency  (TVM/PVEH) 

24.300 

.0000 

.71 

Maintenance  Efficiency  (TVM/MNT) 

4.441 

.0000 

.41 

Safety  (TVM/ACC) 

5.854 

.0000 

.46 

For  each  performance  indicator,  there  are  highly  significant  differences 
between  groups.  Since  the  standard  cutoff  point  for  significance  is  a  probability 
level  not  exceeding  .05,  it  can  be  seen  on  the  table  where  significance  levels  are 
listed  that  all  performance  indicators  far  exceed  this  standard.  The  measures  of 
revenue  generation  and  maintenance  efficiency  are  slightly  less  differentiated 
between  peer  groups  than  the  others  as  shown  by  their  lower  F  scores  and  slightly 
lower  eta  coefficients.  Vehicle  efficiency  shows  the  greatest  differentiation 
between  groups  as  shown  by  its  larger  F  score  (24.3000)  and  large  eta  (.71). 

However,  this  analysis  does  not  indicate  which  peer  groups  are  most  different 
or  whether  the  differences  are  important.  It  is  possible  that  one  or  two  peer  groups 


^N.  H.  Nie,  C.  H.  Hull,  J.  G.  Jenkins,  K.  Steinbrenner  and  D.  H.  Bent,  SPSS: 
Statistical  package  for  the  social  sciences.  (New  York:  McGraw-Hill,  1975.) 
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are  radically  different  from  the  others,  and  the  others  are  indistinguishable  from 
each  other.  Thus  representation  of  the  actual  data  will  be  used  to  show  the 
structure  of  peer  group  differences. 

Comparison  of  Peer  Groups 

The  following  set  of  graphs  compares  the  peer  groups  on  each  of  the  seven 
performance  indicators.  Each  peer  group  is  represented  by  a  bar  with  an  X  on  it. 
The  length  of  the  bar  shows  the  range  of  values  that  the  transit  systems  in  that 
group  achieved  on  that  performance  measure.  The  left  end  of  the  bar  represents  the 
minimum  value  and  the  right  end  represents  the  maximum  value.  The  X  on  each  bar 
shows  the  average  (mean)  for  that  peer  group.  Since  a  higher  value  indicates  better 
performance,  the  bars  with  X's  farther  to  the  right  represent  better  average 
performance.  The  longer  bars  represent  more  variation  on  a  performance  measure. 
When  an  X  is  not  centered  on  a  bar,  it  shows  that  the  peer  group  is  not  evenly 
distributed  around  the  mean.  Therefore,  the  side  of  the  bar  which  is  longest 
typically  contains  one  or  two  values  which  are  extreme  compared  to  the  rest  of  the 
systems  in  the  peer  group. 

Each  performance  indicator  will  be  described  in  terms  of  how  the  peer  groups 
compare  on  average  performance,  how  much  the  peer  groups  overlap  in  performance 
and  what  patterns  exist  across  the  peer  groups. 

Cost  Efficiency 

The  pattern  of  values  for  cost  efficiency  shows  wide  differences  between  peer 
groups.  Two  peer  groups,  11  and  12,  not  only  have  a  significantly  lower  average  for 
hours  per  dollar  of  operating  expense,  their  best  performing  members  do  not  do  as 
well  as  the  poorest  performing  members  of  groups  4,  5,  6  and  7.  Groups  11  and  12  are 
the  very  largest  systems  in  this  sample.  Groups  1  and  2,  which  are  peer  groups  of 
small  systems,  also  show  lower  cost  efficiency.  This  suggests  that  size  reduces  cost 
efficiency  at  the  extremes.  The  two  peer  groups  which  have  the  next  largest 
members,  9  and  10,  also  have  less  cost  efficiency  on  the  average  but  they  overlap 
the  performance  of  the  peer  groups  with  smaller  systems.  The  peer  groups  with 
small  to  medium  systems  such  as  3,  5  and  6  do  the  best  on  this  performance 
indicator  but  they  also  show  the  most  variation.  This  suggests  that  small-to  medium 
size  expedites  performance  but  does  not  necessarily  guarantee  it.  Overall,  this 
performance  measure  clearly  differentiates  between  groups  in  expected  ways. 
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FIGURE  4-13.     THE  RANGE  FOR  COST  EFFICIENCY  BY  PEER  GROUP 
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Service  utilization 

The  pattern  for  service  utilization  is  the  inverse  of  that  for  cost  efficiency  with 
poor  perfornners  on  cost  efficiency  doing  best  on  service  utilization.  The  peer 
groups  of  large  systems,  11  and  12,  carry  the  most  passengers  per  service  hour. 
Some  of  the  peer  groups  with  smaller  systems,  such  as  5,  7  and  8  do  not  even  overlap 
group  12.  However,  some  peer  groups  with  fairly  large  members  (i.e.,  Group  9)  do 
slightly  worse  than  the  smaller  systems  of  Group  A.  Unlike  many  of  the  other 
performance  indicators,  practically  every  peer  group  shows  wide  variation  in 
performance.  However,  despite  this  variation,  the  pattern  for  service  utilization 
clearly  varies  by  peer  groups.  Since  this  performance  indicator  and  the  one 
measuring  cost  efficiency  give  radically  different  versions  of  which  transit  systems 
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FIGURE  A-14.     THE  RANGE  FOR  SERVICE  UTILIZATION  BY  PEER  GROUP 
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are  performing  best,  the  need  to  use  a  multi-faceted  approach  to  measuring 
performance  is  demonstrated.  Researchers  who  advocate  a  single  measure  of 
performance,  even  if  it  is  a  mixed  measure  of  efficiency  and  effectiveness  statistics 
like  cost  per  passenger,  mask  important  differences  in  performance.^ 


Revenue  Generation 

The  amount  of  revenue  generated  by  passenger  fares  and  auxiliary  sources  of 
earned  income  (such  as  advertising  on  buses)  reveals  a  pattern  distinctive  from  that 


^Timothy  A.  Patton,  Transit  performance  indicators.  Transportation  Systems 
Center  Staff  Study  #SS-67-U.5-01 .  (Cambridge,  Mass.:  U.S.  Department  of 
Transportation,  Transportation  Systems  Center,  1983.) 
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FIGURE  4-15.     THE  RANGE  FOR  R£\^NUE  GENERATION  BY  PEER  GROUP 
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of  the  previous  two  performance  indicators.  All  the  peer  groups  cover  a  relatively 
wide  range  of  values  and  the  pattern  shown  here  of  high/low  performers  is  not 
reflected  on  any  other  measure.  Peer  Groups  1  and  12  both  excel,  but  probably  for 
quite  different  reasons.  Group  12  systems  have  extremely  high  passenger  loads  and 
thus  are  able  to  generate  passenger  revenue  by  volume  of  passengers.  Group  1 
systems  on  the  other  hand  are  express  commuter  service  providers  which  charge 
high  fares  and  have  a  low  level  of  subsidization.  Group  5  is  somewhat  unusual  in 
that  it  combines  commuter  and  inter-city  service  providers,  and  resultingly  it  has 
the  most  variation  on  this  measure. 

The  lower  performing  groups  (3,  ^,  6  and  9)  tend  to  have  low  peak  to  base  ratios 
but  some  other  groups  with  low  peak  to  base  ratios  do  average  or  better  (e.g.,  2). 
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For  most  peer  groups,  the  measure  of  revenue  generation  has  a  distinctive  pattern 
but  groups  2,  5,  7,  and  8  are  not  easily  differentiated  from  each  other  because  they 
are  each  so  varied. 


FIGURE  4-16.     THE  RANGE  FOR  LABOR  EFFICIENCY  BY  PEER  GROUP 
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Labor  Efficiency 

The  pattern  for  labor  efficiency  is  quite  similar  to  that  for  cost  efficiency;  the 
very  small  systems  (1  and  2)  and  the  very  large  systems  (11  and  12)  show  the  lowest 
labor  efficiency.  Small  to  medium  size  systems,  particularly  in  groups  3,  5  and  6, 
are  the  most  efficient  but  each  group  has  members  that  are  quite  inefficient.  In 
fact,  the  poor  performing  members  of  those  groups  are  less  labor  efficient  than  any 
of  the  members  of  the  peer  groups  whose  average  efficiency  is  the  lowest. 
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Unexpectedly,  a  high  peak  to  base  ratio  does  not  result  in  lower  labor 
efficiency.  Group  8  has  the  highest  peak  to  base  ratio  by  far  but  exactly  average 
labor  efficiency.  Group  5  also  has  a  relatively  high  peak  to  base  ratio  but  is  the 
most  labor  efficient  group. 

Although  there  is  some  overlap  among  the  peer  groups  on  this  measure,  the 
pattern  of  average  scores  and  distributions  of  values  within  peer  groups  demonstrate 
the  usefulness  of  this  measure. 


FIGURE  A-17.     THE  RANGE  FOR  VEHICLE  EFFICIENCY  BY  PEER  GROUP 


Peer  Group 


1 
2 


Croup 
Group 

Group  3 

Group  4 

Group  5 

Group  6 

Group  7 

Group  8 

Group  9  j 

Group  10 

Group  1 1 

GrouD  12 
National 


TVM/PVEH 
3  4  5  6 


8 

-T- 


Vehiclc  Efficiency  =  Total  Vehicle  Miles 
< 10.  COO' 9)  per  Peak  Vehicle. 


X  on  bar  shows  peer  grouo  mean. 
X  in  circle  shows  national  mean. 


99 


Vehicle  Efficiency 

Different  peer  groups  display  quite  different  levels  of  vehicle  efficiency.  Peer 
Group  8,  which  is  characterized  by  a  high  peak  to  base  ratio,  averages  very  low  on 
vehicle  efficiency.  This  is  not  surprising  considering  that  with  an  average  peak  to 
base  ratio  of  nearly  3,  only  a  third  of  the  buses  employed  by  these  systems  are  in  use 
for  more  than  a  few  hours  a  day.  Other  peer  groups  with  relatively  high  peak  to 
base  ratios  (11  and  7)  also  display  relatively  low  vehicle  efficiency.  Peer  Group  5 
also  has  low  vehicle  efficiency  but  as  a  consequence  of  its  very  slow  speed.  In  this 
peer  group,  most  buses  are  in  use  throughout  the  day  but  they  generate  relatively 
few  miles  each. 

FIGURE  4-18.     THE  RANGE  FOR  MAINTENANCE  EFFICIENCY  BY  PEER  GROUP 
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The  peer  groups  with  good  vehicle  efficiency  get  nearly  twice  as  many  miles 
per  peak  vehicle  as  the  low  performers.  Groups  2,  7  and  9  stand  out  as  having  good 
performance.  Not  only  do  they  average  high  vehicle  efficiency,  their  poorest 
performing  members  tend  to  do  better  than  the  average  members  of  all  other  peer 
groups. 

This  measure  captures  major  distinctions  between  peer  groups  since  peer  groups 
vary  markedly  on  both  average  performance  and  range. 

Maintenance  Efficiency 

Maintenance  efficiency  is  notable  for  its  great  range.  The  values  range  from 
10,000  total  vehicle  miles  to  250,000  total  vehicle  miles  per  maintenance  employee. 
The  peer  groups  with  smaller  transit  systems  (2,3,6,7)  are  most  likely  to  encompass 
a  wide  range  of  values.  Since  maintenance  can  be  done  in-house  by  transit 
employees  or  as  an  outside  service,  caution  must  be  used  when  comparing  systems 
on  this  measure.  The  peer  groups  of  smaller  systems  show  the  greatest  diversity  on 
this  measure  because  small  systems  are  less  likely  to  keep  a  full  maintenance  staff. 
However,  for  larger  systems  this  performance  indicator  is  a  reliable  measure  of 
maintenance  efficiency. 

Although  the  larger  system  peer  groups  (8-12)  have  lower  average  efficiency, 
this  is  in  part  a  consequence  of  different  maintenance  arrangements  as  noted 
above.  Several  peer  groups  with  smaller  systems  (3  and  5)  have  the  lowest 
performing  systems  suggesting  that  in-house  maintenance  can  be  less  efficient 
under  some  circumstances.  Peer  Group  5,  which  is  the  slowest  peer  group,  also 
shows  lower  maintenance  efficiency — in  part  because  the  vehicles  travel  fewer 
miles  relative  to  service  hours. 

Safety 

Like  maintenance  efficiency,  safety  shows  great  variation  in  some  peer  groups 
and  great  consistency  in  others.  The  peer  groups  of  smaller  companies,  2-7,  tend  to 
have  greater  safety,  even  when  extreme  outlying  cases  are  eliminated.  Peer  Group 
2  is  the  safest — probably  because  it  consists  of  small  companies  operating  in  rela- 
tively uncongested  areas  as  shown  by  their  high  average  operating  speed.  Group  1. 
the  other  group  of  fast  systems,  also  shows  superior  safety. 

Peer  groups  of  larger  systems  (9-11)  operate  in  denser  urban  areas  and  tend  to 
have  a  relatively  high  proportion  of  their  vehicles  operating  during  the  congested 
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FIGURE  4-19.     THE  RANGE  FOR  SAFETY  BY  PEER  GROUP 
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peak  hours,  and  thus  have  the  lowest  safety.  Slow  systems  (5)  and  those  operating 
most  during  congested  peak  hours  (8)  also  show  lower  performance. 

This  safety  measure  must  be  used  with  some  caution,  particularly  for  peer 
groups  with  extreme  values,  because  it  is  likely  that  not  all  systems  used  the  same 
definitions  of  accidents.  Despite  this  caution,  safety  measures  do  seem  to  reflect 
patterned  differences  between  peer  groups. 

THE  RELATION  OF  OPERATING  AND  PERFORMANCE  VARIABLES 

The  preceding  sections  have  implicitly  linked  certain  operating  characteristics 
to  specific  aspects  of  performance.  For  instance,  it  was  noted  that  peer  groups  of 
large  systems  have  the  lowest  cost  efficiency.  This  section  of  the  chapter  will 
examine  the  relations  between  operating  and  performance  variables  in  a  more 
systematic  way.    First,  each  performance  indicator  will  be  examined  to  discover 
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which  operating  characteristics  can  affect  performance  on  that  nneasure.  Then  a 
brief  analysis  will  be  done  to  show  that  operating  characteristics  help  shape  the 
perfornnance  profile  of  each  peer  group  across  the  seven  performance  measures. 

Correlation  Analysis 

Although  it  is  easy  to  link  one  operating  characteristic  at  a  time  to  a 
performance  indicator,  this  is  an  overly  simplistic  view  of  how  the  peer  groups 
actually  relate  to  the  operating  characteristics.  Most  of  the  peer  groups  were 
formed  through  the  interaction  of  the  operating  characteristics.  Thus,  except  in 
extreme  cases,  several  characteristics  of  a  transit  system  must  be  considered  before 
it  is  clear  which  peer  group  it  belongs  to.  Thus,  it  is  not  totally  accurate  to  say  that 
it  is  the  largeness  of  transit  systems  that  influence  performance  in  such  and  such  a 
way,  because  the  performance  of  the  peer  groups  also  varies  in  accordance  with 
their  speed  and  peak  to  base  ratios  as  well.  To  further  complicate  matters,  the 
operating  characteristics  are  not  independent  of  one  another.  Large  systems  also 
tend  to  have  higher  peak  to  base  ratios.  So  it  may  look  like  size  is  a  crucial  factor 
in  a  specific  kind  of  performance  when  in  fact  the  crucial  factor  is  the  peak  to  base 
ratio,  or  even  that  both  large  size  and  a  high  peak  to  base  ratio  are  necessary  to 
produce  a  particular  kind  of  performance. 

A  series  of  simple  and  multiple  regression  correlation  analyses  were  performed 
to  disentangle  the  effects  of  the  four  operating  characteristics  on  the  seven 
performance  indicators  (see  technical  details  in  Appendix  H).  Table  A- 3  shows  an 
overview  of  the  correlation  results.  Under  each  performance  indicator,  the 
operating  variables  that  significantly  and  independently  correlate  with  it  are  listed. 
They  are  given  in  the  order  of  their  relative  importance. 

Further  work  needs  to  be  done  on  the  complex  interrelations  shown  in  Table 
4-3,  but  some  tentative  conclusions  can  be  drawn.  The  general  results  of  one 
multiple  regression  analysis  are  shown  on  Table  A-3  and  discussed  below. 
Correlation  coefficients  and  beta  weights  are  not  shown  since  these  vary  between 
analyses. 

As  shown  in  the  table,  each  operating  characteristic  contributes  to  differences 
in  performance  in  significant  ways.  Each  appears  to  be  important  for  several 
different  aspects  of  performance  as  well.  The  size  of  a  transit  system  appears  to  be 
its  most  important  characteristic.  The  number  of  peak  vehicles  not  only  affects 
performance  on  six  of  the  performance  indicators,  it  is  the  most  important  for  four 
of  them.   Total  vehicle  miles,  another  measure  of  size,  also  gives  an  independent 
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TABLE  4-3.  OPERATING  VARIABLES  THAT  CORRELATE  WITH 
PERFORMANCE  INDICATORS 


Cost  Efficiency 
(RVH/OEXP) 


Service  Utilization 
(TPAS/RVH) 


Revenue  Generation 


(OREV/OEXP) 


Peak  Vehicles 
Speed 

Total  Vehicle  Miles 


Peak  Vehicles 
Peak  to  Base  Ratio 
Total  Vehicle  Miles 
Speed 


None 


Labor  Efficiency 


Vehicle  Efficiency 


Maintenance  Efficiency 
(TVM/MNT) 


(TVH/EMP) 


(TVM/PVEH) 


Speed 

Peak  Vehicles 
Total  Vehicle  Miles 


Peak  to  Base  Ratio 
Speed 

Total  Vehicle  Miles 
Peak  Vehicles 


Peak  Vehicles 
Total  Vehicle  Miles 


Safety 
(TVM/ACC) 


Peak  Vehicles 
Total  Vehicle  Miles 


contribution  to  six  performance  indicators.  However,  its  contribution  is  relatively 
snnall  in  each  case. 

Speed  also  has  major  importance.  It  contributes  to  performance  on  four  of  the 
seven  measures  and  is  most  important  for  labor  efficiency.  Peak  to  base  ratio 
appears  to  have  the  least  importance.  It  appears  for  only  two  of  the  performance 
indicators.  However,  it  is  the  most  important  operating  characteristic  in  relation  to 
vehicle  efficiency.  The  results  for  speed  and  peak  to  base  ratio  may  be  under- 
estimated in  these  results,  because  in  other  analyses  they  are  much  more 
important.  However,  across  all  analyses  they  were  found  to  have  some  importance 
and  cannot  be  overlooked  when  analyzing  how  operating  conditions  affect 
performance. 

The  type  of  correlational  analysis  done  here  cannot  capture  other  apparent 
features  of  the  relationship  between  operating  characteristics  and  performance.  In 
some  instances  the  size  of  a  transit  system  has  a  negative  effect  on  performance  at 
both  large  and  small  extremes  of  system  size.  These  results  do  not  adequately  assess 
that  relationship.  Also,  there  appear  to  be  threshold  effects.  Very  high  speed,  as 
exhibited  in  Peer  Group  1,  seems  to  affect  performance,  but  below  this  level  it  is 
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much  less  important.  The  correlation  analysis  cannot  capture  this  type  of  situation 
either.  However,  earlier  discussions  of  patterns  in  the  data  point  out  these 
relationships  in  the  data  and  they  appear  to  be  valid. 

Notably,  revenue  generation  is  not  significantly  predicted  by  any  of  the 
operating  characteristics.  This  is  in  accord  with  the  patterns  shown  in  Figure  A-15. 
The  averages  for  peer  groups  are  scattered  across  the  figure.  In  addition,  each  peer 
grop  covers  a  broad  range  of  values.  For  instance,  in  Peer  Group  5,  the  highest 
system  generates  10  times  as  much  proportional  revenue  from  the  fare  box  than  the 
lowest  member  of  the  group.  This  result  is  not  surprising  since  fare  levels  are  often 
set  by  policy  makers  outside  the  transit  system.  A  special  local  sales  tax  will 
mandate  lower  fares  while  most  private  bus  companies  do  not  have  equivalent 
access  to  many  subsidies  and  must  generate  most  of  their  money  from  operations. 

Profile  Analysis 

Taken  one  at  a  time,  the  performance  indicators  do  show  significant 
relationships  to  operating  characteristics.  In  addition,  the  patterns  across  all  seven 
performance  indicators  further  confirm  the  individual  variable  findings.  To 
demonstrate  this,  the  performance  profiles  of  selected  peer  groups  were  compared 
to  show  how  operating  statistics  relate  to  overall  patterns  of  performance. 
Figure  4-20A  compares  the  performance  profiles  of  the  three  peer  groups  of  largest 
systems  (10,  11  and  12).  Figure  A-20B  compares  the  performance  profiles  of  three 
peer  groups  of  smaller  systems  (2,5,6).  The  profiles  for  the  large  systems  are 
similar — the  strengths  and  weaknesses  of  each  system  are  the  same  although  the 
magnitudes  vary.  The  same  holds  true  for  the  small  groups.  In  addition,  the 
patterns  for  large  and  small  systems  are  the  inverse  of  each  other.  While  large 
groups  are  strong  in  service  utilization  (TPAS/RVH)  and  Revenue  Generation 
(GREV/OEXP),  the  small  systems  are  weak.  The  small  systems  are  above  average  in 
Vehicle  Efficiency  (TVM/PVEH)  and  Maintenance  Efficiency  (TVM/MNT)  and  the 
large  systems  are  relatively  lower  in  these  areas. 

The  next  set  of  graphs.  Figures  4-21A  and  A-21B,  portray  a  comparison  of 
relatively  slow  and  fast  groups.  The  overall  profiles  are  much  less  similar  than  the 
size  comparison.  However,  it  can  be  seen  that  the  fast  groups  are  most  similar  in 
their  high  vehicle  efficiency  (TVM/PVEH)  and  safety  (TVM/ACC)  and  the  slow 
groups  are  relatively  lower  on  those  attributes. 

The  last  comparison.  Figures  4-22A  and  4-22B,  shows  the  peer  groups  with  high 
and  low  peak  to  base  ratios.    As  with  speed,  these  profiles  are  similar  only  in 
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FIGURE  4-20  A.     PERFORMANCE  PROFILES  OF  LARGE  TRANSIT  SYSTEMS 
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FIGURE  4-20  B.     PERFORMANCE  PROFILES  OF  SMALL  TRANSIT  SYSTEMS 
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FIGURE  4-21A.     PERFORMANCE  PROFILES  OF  FAST  TRANSIT  SYSTEMS 
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FIGURE  4-21  B.     PERFORMANCE  PROFILES  OF  SLOW  TRANSIT  SYSTEMS 
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FIGURE  4-22^.     PERFORMANCE  PROFILES  OF  HIGH  PEAK  TO  BASE  TRANSIT  SYSTEMS 
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FIGURE  4-22B.     PERFORMANCE  PROFILES  OF  LOW  PEAK  TO  BASE  TRANSIT  SYSTEMS 
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specific  ways.  The  high  peak  to  base  ratio  groups  have  above  average  revenue 
generation  (OREV/OEXP)  and  generally  low  vehicle  efficiency  (TVM/PVEH).  The 
low  peak  to  base  groups  are  low  on  revenue  generation  (QREV/OEXP)  and  high  on 
vehicle  efficiency  (TVM/PVEH). 

The  results  of  these  comparisons  confirm  the  correlation  results.  The  size  of 
transit  systems  has  a  broad  impact  on  performance,  and  peak  to  base  ratios  and 
speed  make  less  general,  more  specific  contributions  to  performance.  For  clarity's 
sake  only  some  of  the  peer  groups  have  been  shown  on  each  chart.  The  peer  groups 
with  average  size,  peak  to  base  ratio  or  speed  fall  somewhere  between  the  clear 
patterns  shown  for  these  figures.  Certain  exceptions  also  show  the  more  important 
influence  of  speed  or  peak  to  base  ratios  at  the  extremes.  Figure  4-23  shows  Peer 

FIGURE  4-23.  PERFORMANCE  PROFILES  TO  COMPARE  PEER  GROUP  1 
TO  OTHER  SMALL  SYSTEM  PEER  GROUPS 
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Group  1,  the  peer  group  with  the  smallest  average  size  in  comparison  with  the  other 
small  peer  groups  from  Figure  A-20B.  Peer  Group  I's  profile  of  performance  differs 
radically  from  the  other  small  groups  on  the  first  four  performance  indicators.  This 
confirms  the  earlier  finding  that  performance  is  a  consequence  of  the  joint  impact 
of  the  operating  characteristics  of  bus  companies. 

Aggregation  of  Peer  Groups 

The  similarity  of  performance  profiles,  particularly  the  similarities  based  on 
size  comparisons  as  shown  in  Figures  A-20A  and  4-20B,  superficially  suggest  that 
some  peer  groups  may  be  so  similar  that  they  could  form  one  peer  group.  However 
they  are  similar  only  in  comparison  to  the  rest  of  the  nation.  A  close  examination 
of  Figures  ^-13  through  A- 19  reveals  that  no  two  peer  groups  share  both  the  same 
mean  and  range  of  values  on  any  performance  indicator.  Thus  within  its  original 
peer  group  a  transit  system  could  be  performing  well  above  the  mean,  but  in  an 
aggregated  peer  group  it  could  be  operating  below  the  mean. 

Also,  as  stated  in  the  final  section  of  Chapter  3,  it  is  more  meaningful  to  assess 
how  well  transit  management  is  doing  relative  to  peers  who  begin  with  the  same 
operating  characteristics,  not  relative  to  systems  which  are  similar  only  in  perform- 
ance. Aggregating  peer  groups  on  the  basis  of  a  similar  performance  profile  would 
undermine  the  use  of  this  system  of  performance  evaluation  as  a  tool  for  managers. 

However,  some  aggregation  of  peer  groups  can  be  done  through  a  slightly 
different  interpretation  of  the  cluster  analysis  based  upon  operating  characteristics. 
Peer  groups  3  and  4  are  most  similar  and  could  form  one  peer  group.  These  peer 
groups  differ  only  in  terms  of  their  average  speed,  16.2  mph  for  Peer  Group  A  and 
1A.5  mph  for  Peer  Group  3. 

An  even  larger  aggregated  peer  group  can  be  formed  from  Peer  Groups  3,  4  and 
6.  Once  again,  these  peer  groups  differ  mostly  on  speed,  ranging  from  16.2  mph  for 
Peer  Group  4  to  12.2  mph  for  Peer  Group  6.  At  this  point,  however,  the  aggregated 
peer  group  would  have  96  members — over  a  third  of  the  transit  systems  in  the 
analysis.  This  aggregated  peer  group  would  also  begin  to  reduce  the  importance  of 
speed  as  an  operating  characteristic  because  this  large  peer  group  would  cover 
almost  the  entire  range  of  speed.  The  performance  profiles  of  Peer  Groups  3  and  4 
also  differ  substantially  because  of  their  differences  in  speed.  As  major  perform- 
ance differences  would  be  lost  through  this  aggregation,  it  is  not  recommended  that 
these  peer  groups  be  aggregated  except  for  specific  purposes  where  differences 
caused  by  variations  in  average  speed  are  irrelevant. 
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Based  upon  the  cluster  analysis,  there  are  no  other  peer  groups  that  can  be 
legitimately  joined  together  without  violating  important  differences  in  their 
operating  characteristics. 

USES  OF  PERFORMANCE  EVALUATION 
Managerial  Uses 

The  performance  indicators  and  peer  groups  have  primarily  been  established  for 
the  use  of  transit  managers.  In  conjunction,  they  form  a  diagnostic  tool  with  which 
managers  can  pinpoint  problem  areas  within  their  own  transit  operation. 

Peer  groups  create  sets  of  transit  systems  with  similar  operating  conditions 
that  constrain  performance.  To  date  most  peer  comparisons  have  been  limited  to 
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within  state  comparisons.     When  national  sets  have  been  created,  they  have  often 
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resulted  in  comparisons  between  systems  with  dissimilar  operating  conditions.  Now 
a  consistent  set  of  peer  groups  has  been  created  and  methods  outlined  for  placing 
systems  with  each  group. 

For  each  peer  group,  a  set  of  norms  has  been  established  on  each  performance 
indicator.  These  norms  are  the  means  and  standard  deviations  listed  in  Appendix  G. 
Using  these  statistics,  a  transit  manager  can  review  performance  of  his/her  transit 
system,  and  determine  whether  it  has  been  functioning  above  or  below  the  mean.  If 
the  system  is  more  than  one  standard  deviation  above  the  mean,  then  that  system  is 
in  the  top  (roughly)  15%  of  its  peer  group. 

Of  course,  knowing  a  system's  problem  areas  within  the  seven  dimensions  of 
performance  does  not  automatically  reveal  the  underlying  causes.  For  instance,  a 
low  labor  efficiency  score  can  mean  many  things.  It  could  mean  that  drivers  are 
inefficiently  scheduled  to  cover  the  transition  from  the  peak  to  base  period.  Or  it 
could  mean  that  there  are  too  many  maintenance  personnel  relative  to  the  size  of 
the  fleet. 
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In  addition,  a  system's  performance  should  be  judged  in  relation  to  its  own 
objectives  and  policies.  A  transit  system  which  wishes  to  optimize  the  reliability 
and  comfort  of  bus  service  might  choose  to  be  less  maintenance  efficient  in  pursuit 
of  these  alternate  goals. 

The  peer  groups  and  performance  indicators  also  give  transit  managers  a  way  of 

evaluating  their  performance  over  time.    Secular  trends  such  as  inflation  or  a 

nationwide  increase  in  ridership  because  of  a  gas  crisis,  make  it  hard  to  directly 

compare  performance  from  year  to  year.    However,  by  comparing  a  system's 

standing  in  relation  to  the  other  members  of  a  peer  group  over  time  corrects  for 

many  of  the  external  changes. 

Other  research  has  shown  that  different  kinds  of  transit  systems  change  in 

10 

different  ways  over  time.  Peer  groups  allow  comparison  of  transit  systems  that 
are  expected  to  change  in  similar  ways. 

Research  Uses 

This  system  of  performance  evaluation  provides  a  useful  research  tool  in 
several  ways.  The  key  performance  indicators  provide  a  non-arbitrary  set  of 
performance  measures  whose  validity  and  reliability  have  been  established. 

The  peer  groups  are  useful  in  research  because  they  control  for  basic 
differences  between  types  of  transit  systems.  For  instance,  a  study  comparing  the 
relative  benefits  of  hiring  private  management  firms  would  find  invalid  results  if 
private  management  was  used  only  by  small  and  comparatively  efficient  systems. 
Making  such  comparisons  within  peer  groups  provides  a  clearer  picture  of  the  special 
benefits  or  problems  with  private  management. 

The  peer  groups  are  also  useful  for  research  comparison  over  time.  For 
instance,  it  is  useful  to  look  at  whether  different  types  of  transit  systems  respond  to 
government  assistance  in  different  ways.  The  results  from  recent  studies  on  the 
effects  of  subsidies  in  transit  would  have  been  more  convincing  had  the  longitudinal 
studies  been  selected  from  the  same  peer  group. 


^'^Leland  C.  Barbour  and  Robert  J.  Zerillo,  Transit  Performance  in  New  York 
State.  (Albany:  New  York  Department  of  Transportation,  1981.) 

^  ^  John  Pucher,  Anders  Markstedt  and  Ira  Hirschmah,  Impacts  of  subsidies  on 
the  costs  of  urban  public  transit.  Journal  of  Transportation  Economics  and  Policy, 
1983.  17(2).  155-176. 
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Future  Applications 

This  system  of  evaluation  for  transit  performance  is  being  taught  to  transit 
managers  in  a  series  of  workshops,  as  noted  in  the  introduction  to  this  report.  An 
instructional  manual  is  being  prepared  which  will  explain  the  performance  indicators 
and  peer  groups  in  terms  of  their  use  in  performance  evaluation  using  simple 
statistical  and  microcomputer  applications.  Case  studies  present  examples  of  what 
can  be  learned  through  performance  evaluation. 

CONCLUSION 

This  chapter  has  integrated  the  results  of  Chapters  One  through  Three  in  a 
number  of  ways.  Each  peer  group  was  shown  to  have  a  distinctive  performance 
profile  across  the  seven  performance  indicators.  When  each  performance  indicator 
is  examined  in  detail,  the  peer  groups  are  shown  to  differ  significantly  not  only  in 
their  average  performance  but  also  in  the  distribution  of  values  on  each  indicator 
variable. 

The  operating  characteristics  relate  to  performance  in  complex  ways.  The  size 
of  a  transit  system  is  the  most  important  of  the  operating  characteristics,  being  of 
major  importance  in  explaining  variation  on  four  performance  indicators.  Speed  and 
peak  to  base  ratio  are  also  of  importance  for  differences  on  the  three  other 
performance  indicators.  And  each  operating  characteristic  contributes  in  small,  but 
important  ways,  to  differences  in  performance  on  indicators  where  they  are  not  the 
most  important  influence. 
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APPENDIX  A 

SUMMARY  OF  TSC  DATA  FILES  AND  VARIABLES  REORGANIZED 


TSC  FILE  NAME 


VARIABLES 


NRSDWK* 


Number  of  Vehicles  in  Operation 


Total  Vehicle  Miles 
Total  Vehicle  Hours 
Total  Revenue  Miles 
Total  Revenue  Hours 
Revenue  Capacity  Miles 
Unlinked  Passenger  Trips 
Unlinked  Passenger  Miles 
Average  Tinne  per  Unlinked  Trip 
Vehicle  Operators  Full-time 
Vehicle  Operators  Part-time 
Total  Service  Persons 


Total  Employees  Capital  Labor 

Transportation  Executive,  Professional  and  Supervisory 

Personnel  Operating  Labor 
Transportation  Executive,  Professional  and  Supervisory 

Personnel  Capital  Labor 
Transportation  Support  Personnel  Operating  Labor 
Transportation  Support  Personnel  Capital  Labor 
Revenue  Vehicle  Operators  Operating  Labor 
Revenue  Vehicle  Operators  Capital  Labor 
Maintenance  Executive,  Professional  and  Supervisory 

Personnel  Operating  Labor 
Maintenance  Executive,  Professional  and  Supervisory 

Personnel  Capital  Labor 
Maintenance  Support  Personnel  Operating  Labor 
Maintenance  Support  Personnel  Capital  Labor 
Revenue  Vehicle  Maintenance  Mechanics  Operating  Labor 
Revenue  Vehilce  Maintenance  Mechanics  Capital  Labor 
Other  Maintenance  Mechanics  Operating  Labor 
Other  Maintenance  Mechanics  Capital  Labor 
Vehicle  Servicing  Personnel  Operating  Labor 
Vehicle  Servicing  Personnel  Capital  Labor 
General  Administration  Executive,  Professional  and 

Supervisory  Personnel  Operating  Labor 
General  Administrative  Executive,  Professional  and 

Supervisory  Personnel  Capital  Labor 
General  Administration  Support  Personnel  Operating  Labor 
General  Administration  Support  Personnel  Capital  Labor 

**A11  variables  originating  in  file  NRSDWK  had  three  versions, 
one  each  for  a  typical  weekday,  Saturday,  and  Sunday. 


MOPRTN 


Number  of  Revenue  Vehicles 


EMPSCH 


Total  Employees  Operating  Labor 
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TSC  FILE  NAME  VARIABLES 


□WSS 


NRSTDY 


BUSWAY 


MNPENC 


ACCDNT 

UAREA 

WESPSC 

REVSCH 


Total  Operating  Time 

Platform  Time  Line  Service 

Total  Nonoperating  Paid  Work  Time 

Number  of  Vehicles  in  Operation  AM  Peak 
Number  of  Vehicles  in  Operation  Midday 
Number  of  Vehicles  in  Operation  PM  Peak 
Total  Vehicle  Miles  AM  Peak 
Total  Vehicle  Miles  Midday 
Total  Vehicle  Miles  PM  Peak 
Total  Vehilce  Revenue  Miles  AM  Peak 
Total  Vehicle  Revenue  Miles  Midday 
Total  Vehicle  Revenue  Miles  PM  Peak 
Total  Vehicle  Revenue  Hours  AM  Peak 
Total  Vehicle  Revenue  Hours  Midday 
Total  Vehicle  Revenue  Hours  PM  Peak 
Vehicle  Operators  Full  Time  AM  Peak 
Vehicle  Operators  Full  Time  Midday 
Vehicle  Operators  Full  Time  PM  Peak 
Vehicle  Operators  Full  Time  AM  Peak 
Vehicle  Operators  Part  Time  PM  Peak 
Vehicle  Operators  Part  Time  Midday 
Vehicle  Operators  Part  Time  PM  Peak 

Directional  Miles  on  Exclusive  Right  of  Way 
Directional  Miles  on  Controlled  Access  Right  of  Way 
Directional  Miles  on  Mixed  Traffic  Right  of  Way 

Total  Roadcalls 

Kilowatt  Hours  of  Propulsion  Power 
Gallons  of  Diesel  Fuel 
Gallons  of  Gasoline 
Gallons  of  LPG  or  LNG 
Gallons  of  Bunker  Fuel 

Total  Accidents  (Computed) 

Urban  Area  Population 

Total  Hours  of  Operation  Saturday 
Total  Hours  of  Operation  Sunday 

Passenger  Fares  for  Transit  Service 

Special  Transit  Fares 

School  Bus  Service  Revenues 

Freight  Tariffs 

Charter  Service  Revenues 

Auxiliary  Transportation  Revenues 

Non-Transportation  Revenues 

Taxes  Levied  by  Transit  System 

Local  Cash  Grants  and  Reimbursements 


TSC  FILE  NAME  VARIABLES 


REVSCH,  continued   Local  Special  Fare  Assistance 

Federal  Cash  Grants  and  Reimbursements 
Subsidy  from  other  Sections  of  Operations 
Total  Revenue 

Total  Federal  Assistance  for  Capital  Revenue 

State  General  Revenues 
State  Dedicated  Revenues 
State  Total  Assistance 
Local  General  Revenues 
Local  Dedicated  Revenues 
Local  Total  Assistance 

Total  Federal  Assistance  for  Operating  Revenue 

State  General  Revenues 
State  Dedicated  Revenues 
State  Total  Assistance 
Local  General  Revenues 
Local  Dedicated  Revenues 
Local  Total  Assistance 

XDMOF/XTFO  Operators  Sal  and  Wgs  Veh  Opr 

Other  Sal  and  Wgs  Veh  Opr 
Fringe  Benefits  Veh  Opr 
Services  Veh  Opr 
Operators  Sal  and  Wgs  Veh  Maint 
Other  Sal  and  Wgs  Veh  Maint 
Frings  Benefits  Veh  Maint 
Services  Veh  Maint 
Operators  Sal  and  Wgs  Nonveh  Maint 
Other  Sal  and  Wgs  Nonveh  Maint 
Frings  Benefits  Non  Veh  Maint 
Services  Non  Veh  Maint 
Operators  Sal  and  Wgs  Genl  Admin 
Other  Sal  and  Wgs  Genl  Admin 
Frings  Benefits  Genl  Admin 
Services  Genl  Admin 

XMFT/XF  Total  Veh  Operation  Expense 

Total  Veh  Maintenance  Expense 
Total  Nonveh  Maintenance  Expense 
Total  Genl  Admin  Expense 

XO  Total  Expenses  for  Published  Reports 

TRSYS  Total  System  Operating  Expense  from  Form  301 

Transit  System  ID  Number 
Transit  System  name 
Single  or  Multimode 


A-3 


FGCA 
NFGCA 


FGRA 
NFGRA 
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APPENDIX  B 


VARIABLES  AND  THEIR  RESPECTIVE  WEIGHTING  FACTOR 
TO  DISAGGREGATE  MOTOR  BUS  STATISTICS 


Variable 

Passenger  Revenue 
Special  Transit  Fare 
School  Bus  Revenue 
Freight  Tariffs 
Charter  Service 
Auxiliary  Revenue 

Non-transportation  revenue 

Taxes  Levied  by  transit  system 

All  cash  grants 
(state,  local,  Fed.) 

Total  Employee  Wages 

Revenue  Vehicle  Operator  Wages 

Revenue  Vehicle  Maintenance  Wages 

Non-revenue  Vehicle  Maintenance 
Wages 

Fringe  Benefits 


Weighting  Factor 

Motor  bus  passengers/total  passengers 

Motor  bus  passengers/total  passengers 

All  designated  as  motorbus 

All  designated  as  motor  bus 

All  designated  as  motor  bus 

Number  of  motor  bus  vehicles/total 

vehicles  (excluding  demand  responsive 
vehicles) 

Motor  bus  operating  expense/total 
operating  expense 

Motor  bus  operating  expense/total 
operating  expense 

Motor  bus  operating  expense/total 
operating  expense 

Motor  bus  employees/total  employees 

Motor  bus  drivers/total  drivers 

Motor  bus  maintenance  employees/total 
maintenance  employees 

Motor  bus  maintenance  employees/total 
maintenance  employees 

Motor  Bus  Employees/Total  Employees 
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APPENDIX  C 
DESCRIPTIVE  STATISTICS  FOR  FOUR  DATA  SETS 
OF  PERFORMANCE  INDICATOR  VARIABLES 


TABLE  C-1.  DESCRIPTIVE  STATISTICS  FOR  A8  PERFORMANCE  INDICATOR 

VARIABLES  FROM  RAW  REPORTED  DATA 


Number  Standard 


Variable     of  Cases 

Mean 

Deviation 

Variance 

Skewness 

Kurtosis 

TVH/EMP 

275 

.116 

.054 

.001 

3.242 

26.160 

RVH/OEMP 

274 

.167 

.043 

.002 

1.693 

10.335 

TVM/EMP 

27A 

1.532 

.446 

.119 

2.307 

16.321 

PVEH/ADM 

287 

3.146 

1.744 

3.043 

2.219 

9.139 

PVEH/OP 

291 

.579 

.169 

.028 

1.367 

3.703 

PVEH/MNT 

278 

2.390 

1.640 

2.690 

4.352 

27.081 

TVH/AVEH 

280 

.245 

1.07 

.011 

2.868 

13.845 

TVH/PVEH 

278 

.317 

.090 

.008 

1.387 

6.977 

TVM/AVEH 

279 

3.296 

1.916 

3.670 

6.598 

68.464 

TVM/PVEH 

277 

4.241 

1.463 

2.140 

2.105 

12.108 

RVM/TVM 

279 

.941 

.075 

.006 

-2.783 

11.343 

RVM/FUEL 

275 

4.809 

9.554 

91.273 

16.085 

263.908 

TVM/FUEL 

273 

5.143 

10.581 

111.955 

16.098 

263.567 

TVM/MEXP 

267 

3.404 

1.669 

2.786 

2.059 

7.254 

TVM/MNT 

264 

9.928 

8.481 

71.929 

5.594 

39.196 

TVM/RCAL 

272 

.848 

2.764 

7.642 

12.178 

174.989 

RVH/OEXP 

274 

.045 

.015 

.0002 

1.204 

3.063 

TVM/OEXP 

274 

.631 

.228 

.052 

1.414 

3.374 

RVH/TWG 

274 

.102 

.295 

.087 

10.830 

125.1 18 

RVH/OWAG 

268 

.101 

.060 

.004 

6.572 

72.432 

RVH/VMWG 

250 

.962 

8.258 

68.191 

15.746 

248.599 

RVH/ADWG 

270 

.917 

.761 

.579 

3.890 

23.561 

TPAS/RVH 

238 

32.848 

16.452 

270.668 

1.012 

1.395 

TPAS/RVM 

239 

2.560 

1.510 

2.280 

2.277 

10.643 

TPAS/PVH 

248 

9.354 

5.236 

27.411 

1.522 

4.350 

PASM/TPS 

231 

4.556 

4.756 

22.619 

6.309 

52.941 

TVM/ACC 

264 

2.359 

2.344 

5.495 

3.904 

19.440 

RVH/ACC 

264 

.171 

.197 

.039 

6.154 

52.574 

REV/PVEH 

296 

2.567 

2.068 

4.276 

3.937 

26.238 

REV/RVH 

279 

9.124 

7.551 

57.019 

3.034 

12.405 

OREV/RVH 

279 

9.258 

7.579 

57.438 

3.004 

12.213 

REV/TPAS 

249 

.327 

.313 

.098 

4.416 

25.515 

RVH/LSUB 

250 

3.190 

37.707 

1.421.806 

15.508 

243.274 

RVH/SSUB 

224 

15.175 

144.965 

21,014.951 

13.075 

180.880 

RVH/OSUB 

279 

.087 

.099 

.010 

7.458 

68.602 

RVH/TSUB 

279 

.072 

.100 

.010 

7.591 

70.471 

TPAS/LOA 

227 

159.545 

2001.678 

4,006,716.275 

14.968 

224.953 

TPAS/TSUB 

249 

2.237 

5.622 

31.601 

15.069 

231.922 

REV/TSUB 

302 

.747 

1.755 

3.081 

7.132 

59.108 

REV/OSUB 

302 

.862 

1.745 

3.045 

7.082 

58.705 

C-1 


TABLE  C-1  (continued) 


Number 

Standard 

Variable 

of  Cases 

Mean 

Deviation 

Variance 

Skewness 

Kurtosis 

PAS/OSUB 

249 

2.723 

5.628 

31.669 

13.530 

201.382 

PAS/OEXP 

2A6 

1.332 

.605 

.366 

.847 

1.463 

PASM/OEX 

228 

5.251 

3.151 

9.931 

1.782 

4.950 

PAS/TWAG 

246 

2.846 

8.285 

68.636 

12.059 

156.202 

PAS/FUEL 

243 

11.412 

24.536 

601.996 

14.657 

223.480 

PASM/TEX 

230 

5.049 

3.194 

10.202 

1.884 

5.350 

REV/OEXP 

297 

.415 

1.006 

1.012 

16.309 

275.601 

TREV/TEX 

282 

1.016 

.216 

.047 

4.090 

45.771 

C-2 


TABLE  C-2:  DESCRIPTIVE  STATISTICS  FOR  A8  PERFORMANCE  INDICATOR 
VARIABLES  FROM  WEIGHTED  REPORTED  DATA 


Number 
Variable     of  Cases 

r)pwi  af"  inn 

V/aripnpp 

m^p\A/npQQ 

l^MrfnQi<i 

TVH/EMP 

275 

.1 16 

.054 

.001 

5.242 

26.160 

RVH/OEMP 

274 

.167 

.045 

.002 

1.695 

10.555 

TVM/EMP 

274 

1.552 

.446 

.119 

2.507 

16.521 

PVEH/ADM 

287 

5.146 

1.744 

5.045 

2.219 

9.159 

PVEH/OP 

291 

.579 

.169 

.028 

1.567 

5.705 

PVEH/MNT 

278 

2.590 

1.640 

2.690 

4.552 

27.081 

TVH/AVEH 

280 

.245 

1.07 

.01 1 

2.868 

15.845 

TVH/PVEH 

278 

.317 

.090 

.008 

1.587 

TVM/AVEH 

279 

3.296 

1.916 

5.670 

6.598 

68.464 

TVM/PVEH 

277 

4.241 

1.465 

2.140 

2.105 

1  o    1  nn 

12.108 

RVM/TVM 

279 

.941 

.075 

.006 

-2.785 

11     7  A  7 

1 1.545 

RVM/FUEL 

275 

4.809 

Q  CCA 

9.554 

91.275 

1  y  oo  c 

16.085 

oy  7  ooo 

265.908 

TVM/FUEL 

273 

5.145 

10.581 

111    o  c  c 

1 1 1 .955 

1  y  ooo 

16.098 

o  y  7   c  y  "T 

265.567 

TVM/MEXP 

267 

5.404 

1 .669 

2.786 

2.059 

"T  OCA 

7.254 

TVM/MNT 

264 

9.9zo 

O    AO  1 

71.929 

5.594 

TO  1  oy 

59. 1 96 

TVM/RCAL 

272 

.848 

2.  /64 

7.642 

1  O     1  "TO 

12,178 

1  "T  A  OOO 

174.989 

RVH/OEXP 

274 

.045 

O  1  c 

.015 

.0002 

1    OO  y. 

1 .204 

7  oy  7 

3.063 

TVM/OEXP 

274 

.651 

.228 

.052 

1       A  1  A 

1.414 

7     7  ~t  / 

3.574 

RVH/TWG 

274 

.102 

.295 

.087 

10.850 

125.1 18 

RVH/OWAG 

268 

.101 

.060 

.004 

6.572 

72.452 

RVH/VMWG 

250 

.962 

8.258 

68.191 

15.746 

248.599 

RVH/ADWG 

270 

.917 

.761 

r  "7r^ 

.579 

5.890 

25.561 

TPAS/RVH 

238 

32.848 

16.452 

270.668 

1    n  1 

1.012 

1.595 

TPAS/RVM 

259 

2.560 

1  CIO 

1.510 

2.280 

2.277 

1  o  y  /  7 

10.645 

TPAS/PVH 

248 

9.554 

5.256 

27.41 1 

1.522 

4.550 

A    ^A    A    /  T* 

PASM/TPS 

251 

4.556 

4.756 

22.619 

y    7  oo 

6.509 

C         O  A  1 

52.941 

TVM/ACC 

264 

2.559 

2.544 

5.495 

7     OO  A 

5.904 

19.440 

RVH/ACC 

264 

.171 

.197 

.059 

y    1  c  A 

6.1 54 

CO    C  ~T  A 

52.574 

REV/PVEH 

292 

2.450 

1  TOO 

1 .702 

2.897 

y  1  1 

2.61 1 

1  O    /.  "I  C 

10.475 

REV/RVH 

276 

8.720 

6.61 1 

45.708 

7  111 

5.1 1 1 

1  /.    O  1  o 

14.918 

OREV/RVH 

276 

8.854 

A  /.    O  C  1 

44.051 

7    o  y  7 

5.065 

1   A     C  O  "7 

14.527 

REV/TPAS 

249 

.515 

.507 

.094 

A    "TO  1 

4.701 

O  O    7  A  y 

28.546 

RVH/LSUB 

246 

5.256 

38.012 

1   A  A  A    no  C 

1444.885 

1    C     7  O  O 

1 5.582 

O  7  O    7  A  O 

259.548 

RVH/SSUB 

220 

16.018 

148.788 

2,21 57.919 

12.551 

1  /  ✓     O  A  O 

166.949 

RVH/OSUB 

275 

.079 

.072 

0.005 

7.195 

64.990 

RVH/TSUB 

274 

.096 

.385 

.147 

14.994 

257.576 

TPAS/LOA 

224 

167.545 

2,016.147 

4,064,849.818 

14.855 

221.505 

TPAS/TSUB 

246 

4.424 

55.277 

1,107.526 

15.069 

251.922 

REV/TSUB 

295 

1.414 

10.846 

117.654 

16.450 

276.681 

REV/OSUB 

293 

.916 

2.290 

5.242 

7.794 

69.91 1 

PAS/OSUB 

244 

2.549 

1.685 

2.851 

2.194 

6.665 

PAS/OEXP 

246 

1.532 

.605 

.566 

.847 

1.465 

PASM/OEX 

228 

5.251 

5.151 

9.951 

1.782 

4.950 

PAS/TWAG 

246 

2.846 

8.285 

68.656 

12.059 

156.202 

PAS/FUEL 

243 

11.412 

24.556 

601.996 

14.657 

225.480 

PASM/TEX 

230 

5.049 

5.194 

10.202 

1.884 

5.530 

REV/OEXP 

295 

.402 

.970 

.941 

16.500 

274.266 

TREV/TEX 

282 

1.016 

.216 

.047 

4.090 

45.771 

C-3 


TABLE  C-3:  DESCRIPTIVE  STATISTICS  FOR  LOGARITHMS  (BASE  10)  OF 
PERFORMANCE  INDICATOR  VARIABLES  FROM  RAW  REPORTED  DATA 


Number 
Variable     of  Cases 

Mean 

DtariQarQ 
ueviation 

• 

Variance 

Skewness 

Kurtosis 

TVH/EMP 

275 

-0.951 

.117 

.014 

-0.663 

6.772 

RVH/OEMP 

274 

-0.792 

.114 

.013 

-1.130 

7.221 

TVM/EMP 

274 

.168 

.127 

.016 

-1.058 

6.799 

PVEH/ADM 

287 

.440 

.228 

.052 

-0.331 

1.174 

PVEH/OP 

291 

-0.254 

.122 

.015 

-0.153 

2.064 

PVEH/MNT 

278 

.319 

.216 

.047 

.228 

3.694 

TVH/AVEH 

280 

-0.642 

.161 

.026 

.154 

2.546 

TVH/PVEH 

278 

-0.515 

.123 

.015 

-0.427 

2.075 

TVM/AVEH 

279 

.479 

.171 

.029 

.657 

4.105 

TVM/PVEH 

277 

.604 

.142 

.020 

-0.236 

2.033 

RVM/TVM 

279 

-0.028 

.041 

.002 

-3.741 

20.206 

RVM/FUEL 

275 

.613 

.164 

.027 

3.192 

32.415 

TVM/FUEL 

273 

.642 

.156 

.024 

4.106 

40.678 

TVM/MEXP 

267 

.488 

.197 

.039 

-0.131 

1.045 

TVM/MNT 

264 

.927 

.225 

.051 

.471 

5.239 

TVM/RCAL 

272 

-0.463 

.485 

.236 

.942 

1.199 

RVH/OEXP 

274 

-1.370 

.144 

.021 

-0.054 

.300 

TVM/OEXP 

274 

-0.226 

.149 

.022 

.051 

.619 

RVH/TWG 

274 

-1.163 

.252 

.064 

2.779 

16.081 

RVH/OWAG 

268 

-1.040 

.187 

.035 

.324 

2.731 

RVH/VMWG 

250 

-0.430 

.284 

.081 

3.518 

26.557 

RVH/ADWG 

270 

-0.133 

.278 

.077 

.246 

.555 

TPAS/RVH 

238 

1.458 

.240 

.058 

-0.752 

.940 

TPAS/RVM 

239 

.339 

.257 

.066 

-0.582 

.990 

TPAS/PVH 

248 

.900 

.266 

.071 

-0.818 

1.223 

PASM/TPS 

231 

.573 

.231 

.053 

1.617 

4.332 

TVM/ACC 

264 

.260 

.286 

.082 

.801 

.868 

RVH/ACC 

264 

-0.883 

.282 

.080 

.923 

1.710 

REV/PVEH 

296 

.317 

.281 

.079 

-0.127 

1.636 

REV/RVH 

279 

.864 

.278 

.077 

.352 

.752 

OREV/RVH 

279 

.864 

.278 

.077 

.352 

.732 

REV/TPAS 

249 

-0.582 

.263 

.069 

.659 

2.772 

RVH/LSUB 

250 

-0.745 

.567 

.322 

2.106 

8.629 

RVH/SSUB 

223 

-0.373 

.751 

.563 

1.614 

4.226 

RVH/OSUB 

279 

-1.191 

.252 

.063 

.000 

3.893 

RVH/TSUB 

279 

-1.282 

.320 

.102 

.235 

2.173 

TPAS/LOA 

227 

.746 

.598 

.357 

2.171 

8.863 

TPAS/TSUB 

249 

.150 

.369 

.136 

.268 

1.713 

REV/TSUB 

302 

-0.432 

.447 

.200 

.525 

2.001 

REV/OSUB 

302 

-0.351 

.382 

.146 

.722 

2.857 

PAS/OSUB 

?49 

.245 

.300 

.090 

-0.184 

0.347 

PAS/OEXP 

246 

.074 

.225 

.051 

-0.959 

1.560 

PASM/OEX 

228 

.647 

.263 

.069 

-0.530 

.953 

PAS/TWAG 

246 

.280 

.281 

.079 

1.403 

9.783 

PAS/FUEL 

243 

.947 

.250 

.063 

.591 

7.600 

PASM/TEX 

230 

.617 

.314 

.099 

-2.295 

15.407 

REV/OEXP 

297 

-0.496 

.242 

.059 

1.126 

8.350 

TREV/TEX 

282 

-0.008 

.172 

.030 

-11.675 

172.255 

C-4 


TABLE  C-4:  DESCRIPTIVE  STATISTICS  FOR  LOGARITHMS  (BASE  10)  OF 
48  PERFORMANCE  INDICATOR  VARIABLES  FROM  WEIGHTED  REPORTED  DATA 


Number  Standard 


Variable  of 

Cases 

Mean 

Deviation 

Variance 

Skewness 

Kurtosis 

TWM/FMP 
1  V  n/  C_l»lr 

-n  9S ! 

1  1 7 

n  1 A 

-0  AA5 

U.OO  J 

A  779 
0.  /  >  z 

/  *+ 

-n  799 

ni  5 

- 1  1  50 

7  99  1 
/  .zz  1 

1  vivi/ c.ivif-' 

1 AR 

1  97 

ni  A 

—  1  .U  JO 

A  799 

D.  /77 

"  V  cn/  M  1^1*1 

AAfl 

99R 

ns9 

.U  JZ 

-0  '^^  1 

—U.J  J  1 

1    1  7A 

1  .  1  /  M 

^7  1 

_n  9SA 

I  99 

n  1 

n  1  "s^ 

— U.  1  J  J 

9  nz/i 

P\/FM/MMT 

97R 
£.10 

^  1  9 

9  1  A 

nA7 

99R 
.ZZO 

X  A9/i 

TV/W/ A  V/FI-l 
1  VM/M  VLri 

_n  AA9 

1  A  1 
.101 

n9A 

.  1  J«4 

9  "sAA 

z.  jmj 

TV/l-iypV/FM 

97P 

—U.J  i J 

1  9X 

n  1 

n  /ji97 
— u.**z  / 

9  07^^ 
Z.U  /  J 

TV/M/ AX/FM 

1  V  ivs/  M  V  cn 

97Q 

LI/ 

A79 

1  7  1 
.1/1 

n99 

.UZ  7 

A'^7 

.D  J  / 

A  1  n•^ 

*4. 1 U  J 

TVM/PVFH 

711 

£.11 

ADA 

n9n 

-0  95A 

U.Z  JD 

9  055 
Z.U  J  J 

979 

-D  n9R 

nn9 

-5  7A1 

—J. /Mi 

90  90A 
zu.zuo 

RV/M/Fl  IFI 

97S 

61  3 

1  AA 

097 

5  199 

J.  1  7Z. 

59  A  1  5 

JZ.M  1  J 

TV/M/FI  JFl 

97^^ 

AA9 

1  56 

n9A 

A  1  OA 

AO  A7R 

**U.D  /  0 

T\/M/MFyP 

9A7 

488 

197 

059 

-0151 

1  OA  5 

1  .U*4  J 

TVM/MNT 

927 

225 

051 

471 

5  259 

TVM/RCAL 

272 

-0.463 

485 

256 

942 

1  199 

1  •  A  /  / 

RVH/OEXP 

274 

-1.370 

.144 

021 

•  Kim-  1 

-0  054 

500 

TVM/OEXP 

274 

-0.226 

.149 

022 

051 

619 

RVH/TWG 

lllx 

-1  163 

252 

064 

2  779 

16  081 

RVH/OWAG 

268 

-1  040 

187 

055 

524 

2  75  1 

^*  f  J  k 

RVH/VMWG 

250 

-0  450 

284 

081 

5  518 

26  557 

RVH/ADWG 

270 

-0  133 

278 

077 

246 

555 

TPAS/RVH 

238 

i  458 

240 

058 

-0  752 

940 

TPAS/RVM 

239 

339 

257 

.066 

-0.582 

.990 

TPAS/PVH 

248 

900 

266 

071 

-0  818 

1  225 

p ASM/TPS 

9^1 

575 

251 

055 

1  617 

A          A  r 

4  552 

1  V  1^1/  ^ 

9AA 

9An 

089 

801 

868 

RVH/ACr 

9AA 

-0  885 

289 

080 

995 

1  710 

A  •  /  A 

RFV/PV/FH 

797 

507 

979 

074 

-0  590 

1  A9A 
1  .H  70 

RFV/RVH 

f^l— V/I\V(  1 

71S 

85  1 

261 

068 

084 

AAl 

•  MO  1 

nRFV/RV/H 
wrxC  V / IN  V  1  1 

97S 

R59 
.0^7 

9  An 

OAR 

OA  5 

AAR 

RFV/TPAc; 

_n  AOS 

—  U.DU  J 

9A5 

.Z*4  J 

059 
•  U  J  7 

•  *400 

5  ^AO 

J.  JMU 

RVH/I  SI  IR 

9AS 

-n  799 

569 

516 

?  187 

Cm  k\J  9 

R  RA9 
0*00^ 

En  V  is/  OJKJZD 

9  1  R 

_n  '^A5 

~U.  JM  J 

559 

f,  J  J  c. 

1  7nR 

1  .  /  UO 

A  AOA 

97^ 

- 1   1  9R 
—  i  . 1 zo 

9A9 

05R 

1  919 

fl  1  9S 
0. 1 Z  J 

RWi-J/TCI  ID 

97^ 

—  i  .Z  J  J 

1  9 

097 

.U7  / 

flA7 

.00  / 

^  997 

TPAQ/I  nA 

999 

£.£.£. 

7RR 

Ann 

5A0 

9  995 

R  AAn 
O.OmU 

TPAC/TQI  in 
1  KMD/  1  DUD 

0  /i  /i 

1  Q  1 

.J  /7 

1  /l/l 

R7A 
.0/0 

J  JD 

REV/TSUB 

293 

-0.406 

.465 

.217 

.909 

2.851 

REV/OSUB 

293 

-0.508 

.388 

.150 

1.118 

3.007 

PAS/OSUB 

244 

.279 

.286 

.082 

-0.146 

.352 

PAS/OEXP 

246 

.074 

.225 

.051 

-0.959 

1.560 

PASM/OEX 

228 

.647 

.263 

.069 

-0.550 

.953 

PAS/TWAG 

246 

.280 

.281 

.079 

1.403 

9.785 

PAS/FUEL 

243 

.947 

.250 

.063 

.591 

7.600 

PASM/TEX 

230 

.617 

.314 

.099 

-2.295 

15.407 

REV/OEXP 

293 

-0.496 

.242 

.059 

1.126 

8.330 

TREV/TEX 

282 

-0.008 

.172 

.030 

-11.675 

172.255 
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TABLE  E-5.     FACTOR  LOADING  MATRIX  FOR  FY  1979  FACTOR  ANALYSIS 
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The  above  factor  loading  matrix  has  been  rearranged  so  tnat  the  columns  appear 
in  decreasing  order  of  variance  explained  by  factors.    The  rows  have  been 
rearranged  so  that  for  each  successive  factor,  loadings  greater  than  .5000 
appear  first.    Loadings  less  than  .4500  have  been  replaced  by  zero. 
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DESCRIPTIVE  STATISTICS  FOR  PERFORMANCE  PEER  GROUPS 
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1  .oAo 

1  no  I  A n 
lUZ.  1 4U 

1  .  J74 

IT  QO 
1  J.7Z  J 

s  d 

0.840 

1  18.674 

0.530 

1.946 

Group  5 

mean 

1.588 

119.343 

1.930 

13.068 

(36) 

s.d. 

0.957 

155.199 

0.650 

1.568 

Group  6 

mean 

2.319 

33.015 

1.473 

13.441 

(66) 

s.d. 

1.852 

40.147 

0.572 

2.410 

Group  7 

mean 

1.902 

188.533 

2.446 

13.270 

(15) 

s.d. 

1.504 

284.301 

0.928 

4.271 

unclustered 

mean 

2.609 

498.429 

1.326 

12.253 

(7) 

s.d. 

1.770 

1269.923 

0.630 

4.562 
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APPENDIX  F  (continued) 


TVH/EMP 

TVM/PVEH 

TVM/MNT 

Group  1 

mean 

0.094 

4.142 

6.616 

s.a. 

n  n  1 Q 

U.Ul  7 

n  0*70 

Group  2 

mean 

0.154 

4.431 

16.1992 

(15) 

S.a. 

U.U  J 1 

1  .job 

1  Q  CI  no 
1  7.  MUZ 

Group  3 

mean 

0.120 

4.412 

1  1.7037 

(1  ') 

c  A 
S.G. 

n  n9/i 

9  /i  1  1 

1  1 

Group  4 

mean 

0.1 14 

4.283 

10.0824 

c  A 

I  .uu^ 

Group  5 

mean 

0.1 14 

3.858 

8.2342 

c  A 

n  RP7Q 

U.O  7  /~ 

Group  6 

mean 

0.118 

4.388 

9.9930 

(66) 

s.d. 

0.020 

1.236 

4.6703 

Group  7 

mean 

0.111 

3.735 

8.0744 

(15) 

s.d. 

0.024 

1.692 

3.0784 

unclustered 

mean 

0.136 

3.588 

7.2172 

(7) 

s.d. 

0.062 

1.600 

3.7552 

F-2 


APPENDIX  G 


DESCRIPTIVE   STATISTICS   FOR  PERFORMANCE  INDICATORS 


BY   PEER  GROUP 


Peer  Group 

(n) 

1  (2) 


me  an 
s  .  d  . 

minimuiD 
maxiniutn 


RVH/OEXP 

.031 
.  008 
.025 

.036 


TVAS/BM5. 

11 .  5 

11 .  5 
11 .  5 


OREV/OEXP  TVH/EMP 


6A 
32 
A2 
87 


073 
020 
060 
087 


(16)          rean  .040  32.1  . 3A  .095 

s . d.  . OlA  25.6  .21  .024 

minimum  .022  9.7  .11  .047 

maximum  .073  84.5  .81  .133 

(44)          mean  .051  30.1  .26  .118 

s . d.  .013  16.0  .09  .025 

minimum  .026  9.2  .11  .028 

maximum  .090  81.3  .47  .170 


(7)  mean  .042  37.9  .30  .112 

s.d.  .010  18.2  .14  .014 

minimum  .029  21.0  .09  .100 

maximum  .055  71.5  .48  .140 


(15)          mean  .056  24.6  .42  .145 

s.d.  .019  11.2  .29  .049 

minimum  .031  6.3  .08  .053 

maximum  .103  49.9  l.lO  .229 


6  (45)         mean  .055 

•s.d.  .017 

min  imum  .031 

maximum  .12-1. 

7  (78)  mean  .045 

s.d.  . 010 

IT inimum  .  0  30 

maximum  .074 

8  (33)  mean  . 04c 

s.d.  .012 

minimum  .020 

maximum  .073 

9  (8)  mean  .030 

s.d.  . 009 

minimum  . 015 

maximum  .0  45 


28.8  .31  .124 

13.2  .14  .031 

5.4  .09  .031 

73.5  .76  .220 

31.8  .36  .117 

10.6  .17  .017 

5.0  .11  .055 

58.0  1 . 11  .166 

32.1  .34  .095 

14.2  .19  .023 

7.1  .12  .045 
54.8  .93  .170 

40.1  .24  .099 
15.4  .13  .023 
19.0  .07  .074 

72.2  .42  .14 


10     (8)           mean               .035  46.9  .34  .106 

s.d.                . 008  25.2  .11  .014 

minimum          .026  26.1  .19  .080 

maximum         .048  89.8  .49  .128 
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Peer  Group 


RVH/OEXP  TPAS/RVH 


OREV/OEXP  TVH/EMP 


11  (13)       mean  .025 

s.d.  .003 

mlnlaum  .02T) 

maxltsun  .030 

12  (3)         nean  .026 

s.d.  .001 

rin  imum  .025 

maximum  .02  7 

Total             mean  .045 

s.d.  .015 

minimum  .015 

maximum  .121 


52.9  .348  .095 

10.1  .120  .012 

36.0  .178  .066 
6  9.0  .587  .116 

74. Z  .581  ,098 

14.1  ,210  .014 
58.5  .386  .085 

83.4  .807  .113 

32.8  .337  .113 

16.5  .290  .030 
5.4  .070  .028 

89.8  1.100  .229 
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Peer    Group  TVM/PVEH  TVM/MKT  TVM/ACC 


1  mean  3.0  15.3  2.9 

s  .  d  .    .1  .2 

rinimum  3.0  15.2  2..  7 

maxiTnum  3.0  15.4  3.0 

2  mean  5.8  11.9  4.1 
s.d.  1.4  5.0  2.2 
minimum  3.5  5.8  1.5 
maximum  8.8  25.7  7.4 

3  mean  5.3  9.7  2.6 
s.d.  .9  3.9  1.5 
minimum  3.7  1.8  .8 
maximum  8.2  23.5  7.1 

4-                  mean  4.5  9.6  3.0 

s.d.  .8  3.6  2.0 

minimum  3.6  5.3  .9 

maximum  5.6  14.0  6.9 

5  mean  3.0  6.1  1.7 
s.d.  .7  2.8  .9 
min  imum  1.4  .6  .4 
maximum  4.4  12.3  3.7 

6  me  an  4.5  9.2  2.3 
s.d.  .9  3.7  1.6 
minimum  2.2  3.0  .6 
maximum  7.6  21.4  7.3 

7  mean  3.6  9.2  1.9 
s.d.  .7  3.4  1.2 
minimum  2.3  4.6  .7 
maximum  6.1  24.3  8.0 

8  mean  3.0  7.5  1.4 
s.d.  .6  2.7  ,7 
minimum  1.6  2.5  .5 
maximum  4.3  15.6  3.8 

9  mean  5.7  7.5  1.5 
s.d.  1.1  2.2  .5 
minimum  4.4  3.7  .6 
maximum  8.0  11.0  2.2 

10                     mean  4.4  6.2  1.5 

s.d.  .5  2.8  .9 

minimum  3.7  3.7  .5 

maximum  5.1  11.5  3.3 
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Peer  Group 


TVM/PVEH  TVM/KNT  TVM/ACC 


11                   me  an  3.9 

s  .  d  .  .  .6 

minimum  3.1 

maximum  5  .  0 


6.7  1.23 

2.0  .72 

2.6  .50 

11.0  3.00 


12 


mean 
s  .  d  . 

minimum 
max  imum 


U  .  2 
1.1 
3.2 
5.  A 


5 
2 
3 
7 


4 
1 
2 
4 


00 
34 
74 
40 


Total 


mean 
s  .  d . 

minimum 
maximum 


4.2 
1.5 
1.1 
8.0 


9.90 
8.  40 
.  68 
25.  70 


2  .  40 
2.  30 
.48 
8.00 


Note:      The   figures   for  the   total  are   for   the  entire   set  of  transit- 
systems   reporting  Section   15   data   for  FY1980.      Thus   some   of  the 
transit   s-ystems   included  in   the   total  are  not  in  a  peer  group 
because   they  were  missing  data  and   could  not  be   assigned  to  a  peer 
group. 
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APPENDIX  H 

TECHNICAL  NOTES  ON  THE  CORRELATIONAL  ANALYSIS 
OF  THE  RELATIONSHIP  BETWEEN  OPERATING 
CHARACTERISTICS  AND  PERFORMANCE 

The  relationship  between  operating  characteristics  and  performance  was 
explored  in  two  major  ways.  In  the  first,  the  peer  groups  as  a  whole  were  considered 
the  unit  of  analysis.  With  this  approach,  the  important  distinctions  in  size,  speed 
and  peak  to  base  ratio  as  made  by  the  cluster  analysis  were  preserved.  In  the  second 
way,  the  individual  transit  systems  were  the  units  of  analysis.  This  allowed  for  more 
complex  multiple  regression  analyses  because  there  were  more  cases  and  the  data 
was  true  interval  level  data. 

The  peer  group  data  was  analyzed  with  a  series  of  Spearman  rank  order 
correlations  between  each  operating  characteristic  and  each  of  the  seven 
performance  indicators.  The  mean  of  each  variable  for  each  peer  group  was  used  as 
the  value  in  the  correlation. 

Size  as  measured  by  peak  vehicles  was  the  most  important  variable,  being 
significantly  correlated  to  four  performance  indicators  each.  Of  the  performance 
indicators,  only  Revenue  Generation  was  not  significantly  correlated  with  any  of  the 
four  operating  characteristics.  Labor  Efficiency  was  only  correlated  with  speed. 
Each  other  performance  indicator  was  correlated  with  several  operating 
characteristics. 

The  rank  order  correlation  is  not  an  entirely  satisfactory  version  of  the 
relations  between  operating  and  performance  variables.  It  does  not  correct  for 
correlations  between  the  operating  variables,  some  of  which  are  substantially 
correlated.  For  instance,  peak  vehicles  and  total  vehicle  miles  have  a  Pearson's 
correlation  of  .98.  It  is  not  clear  whether  each  of  these  variables  makes  an 
independent  contribution  to  explaining  variance  in  performance,  or  if  they  are 
essentially  redundant. 

The  second  phase  of  analysis  involved  a  series  of  multiple  regressions  with  the 
four  operating  characteristics  as  independent  variables  and  each  performance 
indicator  as  the  criterion.  Since  the  operating  variables  have  extremely  non-normal 
distributions,  as  noted  in  Chapter  1,  the  regression  analysis  was  done  with  the  raw 
data  and  with  log  10  transformations  of  the  independent  variables.   A  stepwise 
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procedure  was  used  in  which  the  independent  variables  entered  the  equation  in  order 
of  their  relative  importance  in  accounting  for  variance  in  the  criterion. 

The  results  with  the  transfornr>ed  variables  are  given  in  Chapter  4  in  Table  4-3. 
In  sumnnary,  size  as  measured  by  number  of  peak  vehicles  was  the  most  important 
independent  variable,  although  each  operating  variable  contributed  significant 
explanatory  power  for  several  performance  indicators.  Only  Revenue  Generation 
was  not  significantly  correlated  with  operating  characteristics.  In  general  these 
results  are  quite  similar  to  those  for  the  rank  order  correlations. 

The  results  with  the  untransformed  data  differ  in  several  ways  from  the  other 
correlation  results.  Peak  to  base  ratio  is  the  most  important  independent  variable, 
being  most  important  in  three  equations  and  entering  into  all  but  one  of  the 
equations.  The  correlation  coefficients  are  lower  with  the  untransformed  data, 
except  for  Revenue  Generation  which  reaches  significance  in  this  analysis. 

However,  all  the  analyses  are  in  accord  with  the  conclusion  that  each  of  the 
operating  characteristics  makes  a  significant  contribution  to  explaining  differences 
in  performance:  and  that  performance  in  Revenue  Generation  is  least  related  to 
operating  characteristics. 

Before  a  definitive  statement  can  be  made  on  the  relation  between  opearting 
characteristics  and  performance,  several  analytical  issues  need  to  be  explored 
further: 

-  Were  the  optimal  transformations  done  on  the  data  for  each  variable? 

-  How  can  non-linear  relationships  be  better  described  in  this  context? 

-  Is  there  a  problem  with  multi-coUinearity  between  the  number  of  peak 
vehicles  and  total  vehicle  miles? 

-  How  can  the  numerous  suppression  effects  be  better  understood?  While  the 
most  common  suppression  was  between  peak  vehicles  and  total  vehicle  miles 
(which  is  not  unexpected  in  light  of  their  high  correlation)  there  was  evidence 
of  suppression  between  other  variables  as  well. 
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APPENDIX  I 

FT  1980  DATA  FOR  TRANSIT  ACEHCIES  BY  PEER  GROUP 

KEY 


COLUMN  VARIABLE 

1  ID  :  Transit  System  ID  Number 

2  PEER  GROUP  :  Peer  Group  ID  Number 

3  URBAN  AREA  :  Urban  Area  Code 

4  PVEH  :  Peak  Vehicles 

5  TVM  :  Total  Vehicle  Miles 

6  SPEED  :  Miles  per  Hour 

7  PKTOBS  :  Peak  to  Base  Ratio 

8  RVH/OEXP  :  Revenue  Vehicle  Hours  per  Operating  Expense 

9  TPAS/RVH  :  Passenger  Trips  per  Revenue  Vehicle  Hour 

10  OREV/OEXP  '■  Weighted  Operating  Revenue  per  Operating  Expense 

11  TVH/EMP  :  Vehicle  Hours  per  Employee 

12  TVM/PVEH  :  Total  Vehicle  Miles  per  Peak  Vehicle 

13  TVM/MNT  :  Vehicle  Miles  per  Maintenance  Employee 

14  TVM/ACC  : Total  Vehicle  Miles  per  Accident 


Notes: 

1.  A   '-9'    irdicates  missing  data. 

2.  Total  Vehicle  Miles   and  Total  Vehicle  Hours   are   in  units 

of  10,000. 


If  your  copy  of  this  Appendix 
is  illegible,  request  ITS 
Working  Paper  83-5  from: 
Institute  of  Trans.  Studies 
Univ.  of  Calif.,  Irvine 
Irvine.   CA  92717 
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